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CROSS REFERENCE TO RELATED APPLICATION 

This application is a Continuation In Part of U.S. 
application Serial No. 07/465,863 (Attorney Docket No. 
169.0011), filed January 16, 1990, which application is 
5 a Continuation In Part of U.S. application Serial No. 

07/405,499 (Attorney Docket No. 169.0007), filed 
September 11, 1989, which application is a Continuation 
In Part of U.S. application Serial No. 07/398,217 
(Attorney Docket No. 169.0003), filed August 25, 1989, 
10 which applications are entitled IMPROVED HLA TYPING 

METHOD AND REAGENTS THEREFOR by Malcolm J . Simons. Each 
of those applications is incorporated herein by 
reference in its entirety. 



15 FIELD OP THE INVENTION 

The present invention relates to a method for 
detection of alleles and haplotypes and reagents 
therefor. 



2 0 BACKGROUND OF THE INVENTION 

Due in part to a number of new analytical 
techniques, there has been a significant increase in 
knowledge about genetic information, particularly in 
humans. Allelic variants of genetic loci have been 

25 correlated to malignant and non-malignant monogenic and 

multigenic diseases. For example, monogenic diseases 
for which the defective gene has been identified include 
DuChenne muscular dystrophy, sickle-cell anemia, Lesch 
Nyhan syndrome, hemophilia, beta-thalassemia, cystic 

30 fibrosis, polycystic kidney disease, ADA deficiency, 

a-l-antitrypsin deficiency, Wilm's tumor and 
retinoblastoma. Other diseases which are believed to be 
monogenic for which the gene has not been identified 
include fragile X mental retardation and Huntington's 

35 chorea. 
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Genes associated with multigenic diseases such as 
diabetes, colon cancer and premature coronary- 
atherosclerosis have also been identified. 

In addition to identifying individuals at risk for 
5 or carriers of genetic diseases, detection of allelic 

variants of a genetic locus has been used for organ 
transplantation, forensics, disputed paternity and a 
variety of other purposes in humans. In commercially 
important plants and animals, genes have not only been 
10 analyzed but genetically engineered and transmitted into 

other organisms. 

A number of techniques have been employed to 
detect allelic variants of genetic loci including 
analysis of restriction fragment length polymorphic 
15 (RFLP) patterns, use of oligonucleotide probes, and DNA 

amplification methods. One of the most complicated 
groups of allelic variants, the major histocompatibility 
complex (MHC) , has been extensively studied. The 
problems encountered in attempting to determine the HLA 

2 0 type of an individual are exemplary of problems 

encountered in characterizing other genetic loci. 

The major histocompatibility complex is a cluster 
of genes that occupy a region on the short arm of 
chromosome 6. This complex, denoted the human leukocyte 
25 antigen (HLA) complex, includes at least 50 loci. For 

the purposes of HLA tissue typing, two main classes of 
loci are recognized. The Class I loci encode 
transplantation antigens and are designated A, B and C. 
The Class II loci (DRA, DRB, DQA1, DQB, DPA and DPB) 

3 0 encode products that control immune responsiveness. Of 

the Class II loci, all the loci are polymorphic with the 
exception of the DRA locus. That is, the DRa antigen 
polypeptide sequence is invariant. 

HLA determinations are used in paternity 
35 determinations, transplant compatibility testing, 

forensics, blood component therapy, anthropological 
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studies, and in disease association correlations to 
diagnose disease or predict disease susceptibility. Due 
power of HIiA to distinguish individuals and the need to 
match HLA type for transplantation, analytical methods 
to unambiguously characterize the alleles of the genetic 
loci associated with the complex have been sought. At 
present, DNA typing using RFLP and oligonucleotide 
probes has been used to type Class II locus alleles. 
Alleles of Class I loci and Class II DR and DQ loci are 
typically determined by serological methods. The 
alleles of the Class II DP locus are determined by 
primed lymphocyte typing (PLT) . 

Each of the HLA analysis methods has drawbacks. 
Serological methods require standard sera that are not 
widely available and must be continuously replenished. 
Additionally, serotyping is based on the reaction of the 
HLA gene products in the sample with the antibodies in 
the reagent sera. The antibodies recognize the 
expression products of the HLA genes on the surface of 
nucleated cells. The determination of fetal HLA type by 
serological methods may be difficult due to lack of 
maturation of expression of the antigens in fetal blood 
cells. 

Oligonucleotide probe typing can be performed in 
two days and has been further improved by the recent use 
of polymerase chain reaction (PCR) amplification. PCR- 
based oligoprobe typing has been performed on Class II 
loci. Primed lymphocyte typing requires 5 to 10 days to 
complete and involves cell culture with its difficulties 
and inherent variability. 

RFLP analysis is time consuming, requiring about 5 
to 7 days to complete. Analysis of the fragment 
patterns is complex. Additionally, the technique 
requires the use of labelled probes. The most commonly 
used label, P, presents well known drawbacks 
associated with the use of radionuclides. 
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A fast, reliable method of genetic locus analysis 
is highly desirable. 

DESCRIPTION OF THE PRIOR ART 

5 U.S. Patent No. 4,683,195 (to Mullis et al, issued 

July 28, 1987) describes a process for amplifying, 
detecting and/or cloning nucleic acid sequences. The 
method involves treating separate complementary strands 
of DNA with two oligonucleotide primers, extending the 

10 primers to form complementary extension products that 

act as templates for synthesizing the desired nucleic 
acid sequence and detecting the amplified sequence. The 
method is commonly referred to as the polymerase chain 
reaction sequence amplification method or PCR. 

15 Variations of the method are described in U.S. Patent 

No. 4,683,194 (to Saiki et al, issued July 28, 1987). 
The polymerase chain reaction sequence amplification 
method is also described by Saiki et al, Science, 
230:1350-1354 (1985) and Scharf et al, Science, 324:163- 

20 166 (1986) . 

U.S. Patent No. 4,582,788 (to Erlich, issued April 
15, 1986) describes an HLA typing method based on 
restriction length polymorphism (RFLP) and cDNA probes 
used therewith. The method is carried out by digesting 

25 an individual's HLA DNA with a restriction endonuclease 

that produces a polymorphic digestion pattern, 
subjecting the digest to genomic blotting using a 
labelled cDNA probe that is complementary to an HLA DNA 
sequence involved in the polymorphism, and comparing the 

3 0 resulting genomic blotting pattern with a standard. 

Locus-specific probes for Class II loci (DQ) are also 
described. 

Kogan et al, New Engl. J. Med, 317:985-990 (1987) 
describes an improved PCR sequence amplification method 
35 that uses a heat-stable polymerase (Taq polymerase) and 

high temperature amplification. The stringent 
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conditions used in the method provide sufficient 
fidelity of replication to permit analysis of the 
amplified DNA by determining DNA sequence lengths by 
visual inspection of an ethidium bromide-stained gel. 
5 The method was used to analyze DNA associated with 

hemophilia A in which additional tandem repeats of a DNA 
sequence are associated with the disease and the 
amplified sequences were significantly longer than 
sequences that are not associated with the disease. 

10 Simons and Erlich, pp 952-958 In: Immunology of 

HLA Vol. 1: Springer-Verlag, New York (1989) summarized 
RFLP-sequence interrelations at the DPA and DPB loci. 
RFLP fragment patterns analyzed with probes by Southern 
blotting provided distinctive patterns for DPwl-5 

15 alleles and the corresponding DPB1 allele sequences, 

characterized two subtypic patterns for DPw2 and DPw4 , 
and identified new DPw alleles. 

Simons et al, pp 959-1023 In: Immunology of HLA 
Vol. 1: Springer-Verlag, New York (1989) summarized 

2 0 restriction length polymorphisms of HLA sequences for 

class II loci as determined by the 10th International 
Workshop Southern Blot Analysis. Southern blot analysis 
was shown to be suitable for typing of the major classes 
of HLA loci. 

25 A series of three articles [Rommens et al, Science 

245:1059-1065 (1989), Riordan et al, Science 245:1066- 
1072 (1989) and Kerem et al, Science 245:1073-1079 
(1989) report a new gene analysis method called 
"jumping" used to identify the location of the CF gene, 

3 0 the sequence of the CF gene, and the defect in the gene 

and its percentage in the disease population, 
respectively . 

DiLelia et al, The Lancet i:497-499 (1988) 
describes a screening method for detecting the two major 
35 alleles responsible for phenylketonuria in Caucasians of 

Northern European descent. The mutations, located at 

s> 
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about the center of exon 12 and at the exon 12 junction 
with intervening sequence 12 are detected by PCR 
amplification of a 245 bp region of exon 12 and flanking 
intervening sequences . The amplified sequence 
encompasses both mutations and is analyzed using probes 
specific for each of the alleles (without prior 
electrophoretic separation) . 

Dicker et al, BioTechniques 7:830-837 (1989) and 
Mardis et al, BioTechniques 7:840-850 (1989) report on 
automated techniques for sequencing of DNA sequences, 
particularly PCR-generated sequences. 

Each of the above-described references is 
incorporated herein by reference in its entirety. 

SUMMARY OF THE INVENTION 

The present invention provides a method for 
detection of at least one allele of a genetic locus and 
can be used to provide direct determination of the 
haplotype. The method comprises amplifying genomic DNA 
with a primer pair that spans an intron sequence and 
defines a DNA sequence in genetic linkage with an allele 
to be detected. The primer-defined DNA sequence 
contains a sufficient number of intron sequence 
nucleotides to characterize the allele. Genomic DNA is 
amplified to produce an amplified DNA sequence 
characteristic of the allele. The amplified DNA 
sequence is analyzed to detect the presence of a genetic 
variation in the amplified DNA sequence such as a change 
in the length of the sequence, gain or loss of a 
restriction site or substitution of a nucleotide. The 
variation is characteristic of the allele to be 
detected. 

The present invention is based on the finding that 
intron sequences contain genetic variations that are 
characteristic of adjacent and remote alleles on the 
same chromosome. In particular, DNA sequences that 
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include a sufficient number of intron sequence 
nucleotides can be used for direct determination of 
haplotype. 

The method can be used to detect alleles of 
genetic loci for any eukaryotic organism. Of particular 
interest are loci associated with malignant and 
nonmalignant monogenic and multigenic diseases , and 
identification of individual organisms or species in 
both plants and animals. In a preferred embodiment, the 
method is used to determine HLA allele type and 
haplotype. 

Kits comprising one or more of the reagents used 
in the method are also described. 

DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides a method for 
detection of alleles and haplotypes through analysis of 
intron sequence variation. The present invention is 
based on the discovery that amplification of intron 
sequences that exhibit linkage disequilibrium with 
adjacent and remote loci can be used to detect alleles 
of those loci. The present method reads haplotypes as 
the direct output of the intron typing analysis when a 
single, individual organism is tested. The method is 
particularly useful in humans but is generally 
applicable to all eukaryotes, and is preferably used to 
analyze plant and animal species. 

The method comprises amplifying genomic DNA with a 
primer pair that spans an intron sequence and defines a 
DNA sequence in genetic linkage with an allele to be 
detected. Primer sites are located in conserved regions 
in the introns or exons bordering the intron sequence to 
be amplified. The primer-defined DNA sequence contains 
a sufficient number of intron sequence nucleotides to 
characterize the allele. The amplified DNA sequence is 
analyzed to detect the presence of a genetic variation 
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such as a change in the length of the sequence, gain or 
loss of a restriction site or substitution of a 
nucleotide. 

The intron sequences provide genetic variations 
5 that, in addition to those found in exon sequences, 

further distinguish sample DNA, providing additional 
information about the individual organism. This 
information is particularly valuable for identification 
of individuals such as in paternity determinations and 

10 in forensic applications. The information is also 

valuable in any other application where heterozygotes 
(two different alleles) are to be distinguished from 
homozygotes (two copies of one allele) . 

More specifically, the present invention provides 

15 information regarding intron variation. Using the 

methods and reagents of this invention, two types of 
intron variation associated with genetic loci have been 
found. The first is allele-associated intron variation. 
That is, the intron variation pattern associates with 

20 the allele type at an adjacent locus. The second type 

of variation is associated with remote alleles 
(haplotypes) . That is, the variation is present in 
individual organisms with the same genotype at the 
primary locus. Differences may occur between sequences 

25 of the same adjacent and remote locus types. However, 

individual-limited variation is uncommon. 

Furthermore, an amplified DNA sequence that 
contains sufficient intron sequences will vary depending 
on the allele present in the sample. That is, the 

3 0 introns contain genetic variations (e.g. length 

polymorphisms due to insertions and/or deletions and 
changes in the number or location of restriction sites) 
which are associated with the particular allele of the 
locus and with the alleles at remote loci. 

3 5 The reagents used in carrying out the methods of 

this invention are also described. The reagents can be 
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provided in kit form comprising one or more of the 
reagents used in the method. 



Definitions 

5 The term "allele", as used herein, means a genetic 

variation associated with a coding region; that is, an 
alternative form of the gene. 

The term "linkage", as used herein, refers to the 
degree to which regions of genomic DNA are inherited 

10 together. Regions on different chromosomes do not 

exhibit linkage and are inherited together 50% of the 
time. Adjacent genes that are always inherited together 
exhibit 100% linkage. 

The term "linkage disequilibrium", as used herein, 

15 refers to the co-occurrence of two alleles at linked 

loci such that the frequency of the co-occurrence of the 
alleles is greater than would be expected from the 
separate frequencies of occurrence of each allele. 
Alleles that co-occur with frequencies expected from 

20 their separate frequencies are said to be in "linkage 

equilibrium" . 

As used herein, "haplotype" is a region of genomic 
DNA on a chromosome which is bounded by recombination 
sites such that genetic loci within a haplotypic region 

25 are usually inherited as a unit. However, occasionally, 

genetic rearrangements may occur within a haplotype. 
Thus, the term haplotype is an operational term that 
refers to the occurrence on a chromosome of linked loci. 
As used herein, the term "intron" refers to 

3 0 untranslated DNA sequences between exons, together with 

5 1 and 3 1 untranslated regions associated with a genetic 
locus. In addition, the term is used to refer to the 
spacing sequences between genetic loci (intergenic 
spacing sequences) which are not associated with a 

35 coding region and are colloquially referred to as 

"junk". While the art traditionally uses the term 

r> 
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{'intron" to, refer only to untranslated sequences between 
exons, this expanded definition was necessitated by the 
lack of any art recognized term which encompasses all 
non-exon sequences . 

As used herein, an "intervening sequence" is an 
intron which is located between two exons within a gene. 
The term does not encompass upstream and downstream 
noncoding -sequences associated with the genetic locus. 

As used herein, the term "amplified DNA sequence" 
refers to DNA sequences which are copies of a portion of 
a DNA sequence and its complementary sequence, which 
copies correspond in nucleotide sequence to the original 
DNA sequence and its complementary sequence. 

The term "complement", as used herein, refers to a 
DNA sequence that is complementary to a specified DNA 
sequence. 

The term "primer site", as used herein, refers to 
the area of the target DNA to which a primer hybridizes. 

The term "primer pair", as used herein, means a 
set of primers including a 5 1 upstream primer that 
hybridizes with the 5 1 end of the DNA sequence to be 
amplified and a 3 1 downstream primer that hybridizes 
with the complement of the 3 ' end of the sequence to be 
amplified. 

The term "exon-limited primers", as used herein, 
means a primer pair having primers located within or 
just outside of an exon in a conserved portion of the 
intron, which primers amplify a DNA sequence which 
includes an exon or a portion thereof and not more than 
a small, para-exon region of the adjacent intron (s) . 

The term "intron-spanning primers", as used 
herein, means a primer pair that amplifies at least a 
portion of one intron , which amplified intron region 
includes sequences which are not conserved. The intron- 
spanning primers can be located in conserved regions of 
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the introns or in adjacent, upstream and/or downstream 
exon sequences. 

The te^m "genetic locus", as used herein, means 
the region of the genomic DNA that includes the gene 
that encodes a protein including any upstream or 
downstream transcribed noncoding regions and associated 
regulatory regions. Therefore, an HLA locus is the 
region of the genomic DNA that includes the gene that 
encodes an HLA gene product. 

As used herein, the term "adjacent locus" refers 
to either (1) the locus in which a DNA sequence is 
located or (2) the nearest upstream or downstream 
genetic locus for intron DNA sequences not associated 
with a genetic locus. 

As used herein, the term "remote locus" refers to 
either (1) a locus which is upstream or downstream from 
the locus in which a DNA sequence is located or (2) for 
intron sequences not associated with a genetic locus, a 
locus which is upstream or downstream from the nearest 
upstream or downstream genetic locus to the intron 
sequence. 

The term "locus-specific primer", as used herein, 
means a primer that specifically hybridizes with a 
portion of the stated gene locus or its complementary 
strand, at least for one allele of the locus, and does 
not hybridize with other DNA sequences under the 
conditions used in the amplification method. 

As used herein, the terms "endonuclease" and 
"restriction endonuclease" refer to an enzyme that cuts 
double-stranded DNA having a particular nucleotide 
sequence. The specificities of numerous endonucleases 
are well known and can be found in a variety of 
publications, e.g. Molecular Cloning: A Laboratory 
Manual by Maniatis et al, Cold Spring Harbor Laboratory 
1982. That manual is incorporated herein by reference 
in its entirety. 
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The term "restriction fragment length 
polymorphism" (or RFLP) , as used herein , refers to 
differences in DNA nucleotide sequences that produce 
fragments of different lengths when cleaved by a 
5 restriction endonuclease. 

The term "primer-defined length polymorphisms" (or 
PDLP) , as used herein, refers to differences in the 
lengths of amplified DNA sequences due to insertions or 
deletions in the intron region of the locus included in 
10 the amplified DNA sequence. 

The term "HLA DNA", as used herein, means DNA that 
includes the genes that encode HLA antigens. HLA DNA is 
found in all nucleated human cells. 

15 Primers 

The method of this invention is based on 
amplification of selected intron regions of genomic DNA. 
The methodology is facilitated by the use of primers 
that selectively amplify DNA associated with one or more 

2 0 alleles of a genetic locus of interest and not with 

other genetic loci. 

A locus-specific primer pair contains a 5 1 
upstream primer that defines the 5 1 end of the amplified 
sequence by hybridizing with the 5 1 end of the target 
25 sequence to be amplified and a 3' downstream primer that 

defines the 3' end of the amplified sequence by 
hybridizing with the complement of the 3 1 end of the DNA 
sequence to be amplified. The primers in the primer 
pair do not hybridize with DNA of other genetic loci 

3 0 under the conditions used in the present invention. 

For each primer of the locus-specific primer pair, 

the primer hybridizes to at least one allele of the DNA 

locus to be amplified or to its complement. A primer 

pair can be prepared for each allele of a selected 

35 locus, which primer pair amplifies only DNA for the 

selected locus. In this way combinations of primer 
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pairs can be used to amplify genomic DNA of a particular 
locus, irrespective of which allele is present in a 
sample. Preferably, the primer pair amplifies DNA of at 
least two, more preferably more than two, alleles of a 
5 locus. In a most preferred embodiment, the primer sites 

are conserved, and thus amplify all haplotypes. 
However, primer pairs or combinations thereof that 
specifically bind with the most common alleles present 
in a particular population group are also contemplated* 

10 The amplified DNA sequence that is defined by the 

primers contains a sufficient number of intron sequence 
nucleotides to distinguish between at least two alleles 
of an adjacent locus, and preferably, to identify the 
allele of the locus which is present in the sample. For 

15 some purposes, the sequence can also be selected to 

contain sufficient genetic variations to distinguish 
between individual organisms with the same allele or to 
distinguish between haplotypes. 

20 Length of sequence 

The length of the amplified sequence which is 
required to include sufficient genetic variability to 
enable discrimination between all alleles of a locus 
bears a direct relation to the extent of the 

25 polymorphism of the locus (the number of alleles) . That 

is, as the number of alleles of the tested locus 
increases, the size of an amplified sequence which 
contains sufficient genetic variations to identify each 
allele increases. For a particular population group, 

3 0 one or more of the recognized alleles for any given 

locus may be absent from that group and need not be 
considered in determining a sequence which includes 
sufficient variability for that group. Conveniently, 
however, the primer pairs are selected to amplify a DNA 

35 sequence which is sufficient to distinguish between all 
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recognized alleles of the tested locus. The same 
considerations apply when a haplotype is determined. 

For example, the least polymorphic HLA locus is 
DPA which currently has four recognized alleles. For 
5 that locus, a primer pair which amplifies only a portion 

of the variable exon encoding the allelic variation 
contains sufficient genetic variability to distinguish 
between the alleles when the primer sites are located in 
an appropriate region of the variable exon. Exon- 

10 limited primers can be used to produce an amplified 

sequence that includes as few as about 2 00 nucleotides 
(nt) . However, as the number of alleles of the locus 
increases, the number of genetic variations in the 
sequence must increase to distinguish all alleles. 

15 Addition of invariant exon sequences provides no 

additional genetic variation. When about eight or more 
alleles are to be distinguished, as for the DQA1 locus 
and more variable loci, amplified sequences should 
extend into at least one intron in the locus, preferably 

2 0 an intron adjacent to the variable exon. 

Additionally, where alleles of the locus exist 
which differ by a single basepair in the variable exon, 
intron sequences are included in amplified sequences to 
provide sufficient variability to distinguish alleles. 
25 For example, for the DQA1 locus (with eight currently 

recognized alleles) and the DPB locus (with 24 alleles) , 
the DQA1.1/1.2 (now referred to as DQA1 0101/0102) and 
DPB2. 1/4.2 (now referred to as DPB0201/0402) alleles 
differ by a single basepair. To distinguish those 

3 0 alleles, amplified sequences which include an intron 

sequence region are required. About 300 to 500 
nucleotides is sufficient, depending on the location of 
the sequence. That is, 300 to 500 nucleotides comprised 
primarily of intron sequence nucleotides sufficiently 
35 close to the variable exon are sufficient. 
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For loci with more extensive polymorphisms (such 
as DQB with 14 currently recognized alleles, DPB with 2 4 
currently recognized alleles, DRB with 3 4 currently 
recognized alleles and for each of the Class I loci) , 
5 the amplified sequences need to be larger to provide 

sufficient variability to distinguish between all the 
alleles. An amplified sequence that includes at least 
about 0.5 kilobases (Kb), preferably at least about 
1.0 Kb, more preferably at least about 1.5 Kb generally 

10 provides a sufficient number of restriction sites for 

loci with extensive polymorphisms. The amplified 
sequences used to characterize highly polymorphic loci 
are generally between about 800 to about 2,000 
nucleotides (nt) , preferably between about 1000 to about 

15 18 00 nucleotides in length. 

When haplotype information regarding remote 
alleles is desired, the sequences are generally between 
about 1,000 to about 2,000 nt in length. Longer 
sequences are required when the amplified sequence 

20 encompasses highly conserved regions such as exons or 

highly conserved intron regions, e.g., promoters, 
operators and other DNA regulatory regions. Longer 
amplified sequences (including more intron nucleotide 
sequences) are also required as the distance between the 

25 amplified sequences and the allele to be detected 

increases. 

Highly conserved regions included in the amplified 
DNA sequence, such as exon sequences or highly conserved 
intron sequences (e.g. promoters, enhancers, or other 

3 0 regulatory regions) may provide little or no genetic 

variation. Therefore, such regions do not contribute, 
or contribute only minimally, to the genetic variations 
present in the amplified DNA sequence. When such 
regions are included in the amplified DNA sequence, 

35 additional nucleotides may be required to encompass 

sufficient genetic variations to distinguish alleles, in 

169.0018 

16 



• # 

comparison to an amplified DNA sequence of the same 
length including only intron sequences. 

Locat ion of the amplified DNA sequence 

The amplified DNA sequence is located in a region 
of genomic DNA that contains genetic variation which is 
in genetic linkage with the allele to be detected. 
Preferably, the sequence is located in an intron 
sequence adjacent to an exon of the genetic locus. More 
preferably, the amplified sequence includes an 
intervening sequence adjacent to an exon that encodes 
the allelic variability associated with the locus (a 
variable exon) . The sequence preferably includes at 
least a portion of one of the introns adjacent to a 
variable exon and can include a portion of the variable 
exon. When additional sequence information is required, 
the amplified DNA sequence preferably encompasses a 
variable exon and all or a portion of both adjacent 
intron sequences. 

Alternatively, the amplified sequence can be in an 
intron which does not border an exon of the genetic 
locus. Such introns are located in the downstream or 
upstream gene flanking regions or even in an intervening 
sequence in another genetic locus which is in linkage 
disequilibrium with the allele to be detected. 

For some genetic loci, genomic DNA sequences may 
not be available. When only cDNA sequences are 
available and intron locations within the sequence are 
not identified, primers are selected at intervals of 
about 2 00 nt and used to amplify genomic DNA. If the 
amplified sequence contains about 2 00 nt, the location 
of the first primer is moved about 2 00 nt to one side of 
the second primer location and the amplification is 
repeated until either (1) an amplified DNA sequence that 
is larger than expected is produced or (2) no amplified 
DNA sequence is produced. In either case, the location 
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of an intron sequence has been determined. The same 
methodology can be used when only the sequence of a 
marker site that is highly linked to the genetic locus 
is available, as is the case for many genes associated 
5 with inherited diseases. 

When the amplified DNA sequence does not include 
all or a portion of an intron adjacent to the variable 
exon(s) , the sequence must also satisfy a second 
requirement. The amplified sequence must be 

10 sufficiently close to the variable exon(s) to exclude 

recombination and loss of linkage disequilibrium between 
the amplified sequence and the variable exon(s) . This 
requirement is satisfied if the regions of the genomic 
DNA are within about 5 Kb, preferably within about 4 Kb, 

15 most preferably within 2 Kb of the variable exon(s) . 

The amplified sequence can be outside of the genetic 
locus but is preferably within the genetic locus. 

Preferably, for each primer pair, the amplified 
DNA sequence defined by the primers includes at least 

20 200 nucleotides, and more preferably at least 400 

nucleotides, of an intervening sequence adjacent to the 
variable exon(s) . Although the variable exon usually 
provides fewer variations in a given number of 
nucleotides than an adjacent intervening sequence, each 

25 of those variations provides allele-relevant 

information. Therefore, inclusion of the variable exon 
provides an advantage. 

Since PCR methodology can be used to amplify 

i 

sequences of several Kb, the primers can be located so 
3 0 that additional exons or intervening sequences are 

included in the amplified sequence. of course, the 
increased size of the amplified DNA sequence increases 
the chance of replication error, so addition of 
invariant regions provides some disadvantages. However, 
3 5 those disadvantages are not as likely to affect an 

analysis based on the length of the sequence or the RFLP 

169.0018 

18 



fragment patterns as one based on sequencing the 
amplification product. For particular alleles, 
especially those with highly similar exon sequences, 
amplified sequences of greater than about 1 or 1.5 Kb 
5 may be necessary to discriminate between all alleles of 

a particular locus. 

The ends of the amplified DNA sequence are defined 
by the primer pair used in the amplification. Each 
primer sequence must correspond to a conserved region of 

10 the genomic DNA sequence. Therefore, the location of 

the amplified sequence will, to some extent, be dictated 
by the need to locate the primers in conserved regions. 
When sufficient intron sequence information to determine 
conserved intron regions is not available, the primers 

15 can be located in conserved portions of the exons and 

used to amplify intron sequences between those exons. 

When appropriately-located, conserved sequences 
are not unique to the genetic locus, a second primer 
located within the amplified sequence produced by the 

20 first primer pair can be used to provide an amplified 

DNA sequence specific for the genetic locus. At least 
one of the primers of the second primer pair is located 
in a conserved region of the amplified DNA sequence 
defined by the first primer pair. The second primer 

2 5 pair is used following amplification with the first 

primer pair to amplify a portion of the amplified DNA 
sequence produced by the first primer pair. 

There are three major types of genetic variations 
that can be detected and used to identify an allele. 

3 0 Those variations, in order of ease of detection, are 

(1) a change in the length of the sequence, (2) a change 
in the presence or location of at least one restriction 
site and (3) the substitution of one or a few 
nucleotides that does not result in a change in a 
35 restriction site. Other variations within the amplified 

DNA sequence are also detectable. 
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There are three types of techniques which can be 
used to detect the variations. The first is sequencing 
the amplified DNA sequence. Sequencing is the most time 
consuming and also the most revealing analytical method, 
since it detects any type of genetic variation in the 
amplified sequence. The second analytical method uses 
allele-specif ic oligonucleotide or sequence-specific 
oligonucleotides probes (ASO or SSO probes) . Probes can 
detect single nucleotide changes which result in any of 
the types of genetic variations, so long as the exact 
sequence of the variable site is known. A third type of 
analytical method detects sequences of different lengths 
(e.g., due to an insertion or deletion or a change in 
the location of a restriction site) and/or different 
numbers of sequences (due to either gain or loss of 
restriction sites) . A preferred detection method is by 
gel or capillary electrophoresis. To detect changes in 
the lengths of fragments or the number of fragments due 
to changes in restriction sites, the amplified sequence 
must be digested with an appropriate restriction 
endonuclease prior to analysis of fragment length 
patterns. 

The first genetic variation is a difference in the 
length of the primer-defined amplified DNA sequence, 
referred to herein as a primer-defined length 
polymorphism (PDLP) , which difference in length 
distinguishes between at least two alleles of the 
genetic locus. The PDLPs result from insertions or 
deletions of large stretches (in comparison to the total 
length of the amplified DNA sequence) of DNA in the 
portion of the intron sequence defined by the primer 
pair. To detect PDLPs, the amplified DNA sequence is 
located in a region containing insertions or deletions 
of a size that is detectable by the chosen method. The 
amplified DNA sequence should have a length which 
provides optimal resolution of length differences. For 
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electrophoresis, DNA sequences of about 3 00 to 500 bases 
in length provide optimal resolution of length 
differences. Nucleotide sequences which differ in 
length by as few as 3 nt, preferably 25 to 50 nt, can be 
5 distinguished. However, sequences as long as 8 00 to 

2,000 nt which differ by at least about 50 nt are also 
readily distinguishable. Gel electrophoresis and 
capillary electrophoresis have similar limits of 
resolution. Preferably the length differences between 

10 amplified DNA sequences will be at least 10, more 

preferably 20, most preferably 50 or more, nt between 
the alleles. Preferably, the amplified DNA sequence is 
between 300 to 1,000 nt and encompasses length 
differences of at least 3, preferably 10 or more nt. 

15 Preferably, the amplified sequence is located in 

an area which provides PDLP sequences that distinguish 
most or all of the alleles of a locus. An example of 
PDLP-based identification of five of the eight DQA1 
alleles is described in detail in the examples. 

20 When the variation to be detected is a change in a 

restriction site, the amplified DNA sequence necessarily 
contains at least one restriction site which (1) is 
present in one allele and not in another, (2) is 
apparently located in a different position in the 

25 sequence of at least two alleles, or (3) combinations 

thereof. The amplified sequence will preferably be 
located such that restriction endonuclease cleavage 
produces fragments of detectably different lengths, 
rather than two or more fragments of approximately the 

30 same length. 

For allelic differences detected by ASO or SSO 
probes, the amplified DNA sequence includes a region of 
from about 2 00 to about 400 nt which is present in one 
or more alleles and not present in one or more other 

35 alleles. In a most preferred embodiment, the sequence 

contains a region detectable by a probe that is present 
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in only one allele of the genetic locus. However, 
combinations of probes which react with some alleles and 
not others can be used to characterize the alleles. 



contemplated that use of more than one amplified DNA 
sequence and/or use of more than one analytical method 
per amplified DNA sequence may be required for highly 
polymorphic loci, particularly for loci where alleles 
differ by single nucleotide substitutions that are not 
unique to the allele or when information regarding 
remote alleles (haplotypes) is desired. More 
particularly, it may be necessary to combine a PDLP 
analysis with an RFLP analysis, to use two or more 
amplified DNA sequences located in different positions 
or to digest a single amplified DNA sequence with a 
plurality of endonucleases to distinguish all the 
alleles of some loci. These combinations are intended 
to be included within the scope of this invention. 

For example, the analysis of the haplotypes of 
DQA1 locus described in the examples uses PDLPs and RFLP 
analysis using three different enzyme digests to 
distinguish the eight alleles and 20 of the 32 
haplotypes of the locus. 

Length and sequence homology of primers 

Each locus-specific primer includes a number of 
nucleotides which, under the conditions used in the 
hybridization, are sufficient to hybridize with an 
allele of the locus to be amplified and to be free from 
hybridization with alleles of other loci. The 
specificity of the primer increases with the number of 
nucleotides in its sequence under conditions that 
provide the same stringency. Therefore, longer primers 
are desirable. Sequences with fewer than 15 nucleotides 
are less certain to be specific for a particular locus. 
That is, sequences with fewer than 15 nucleotides are 



For the method described herein, it is 
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more likely to be present in a portion of the DNA 
associated with other genetic loci, particularly loci of 
other common origin or evolutionarily closely related 
origin, in inverse proportion to the length of the 
5 nucleotide sequence. 

Each primer preferably includes at least about 15 
nucleotides, more preferably at least about 2 0 
nucleotides. The primer preferably does not exceed 
about 3 0 nucleotides, more preferably about 2 5 

10 nucleotides. Most preferably, the primers have between 

about 20 and about 25 nucleotides. 

A number of preferred primers are described 
herein. Each of those primers hybridizes with at least 
about 15 consecutive nucleotides of the designated 

15 region of the allele sequence. For many of the primers, 

the sequence is not identical for all of the other 
alleles of the locus. For each of the primers, 
additional preferred primers have sequences which 
correspond to the sequences of the homologous region of 

20 other alleles of the locus or to their complements. 

When two sets of primer pairs are used 
sequentially, with the second primer pair amplifying the 
product of the first primer pair, the primers can be the 
same size as those used for the first amplification. 

25 However, smaller primers can be used in the second 

amplification and provide the requisite specificity. 
These smaller primers can be selected to be allele- 
specific, if desired. The primers of the second primer 
pair can have 15 or fewer, preferably 8 to 12, more 

3 0 preferably 8 to 10 nucleotides. When two sets of primer 

pairs are used to produce two amplified sequences, the 
second amplified DNA sequence is used in the subsequent 
analysis of genetic variation and must meet the 
requirements discussed previously for the amplified DNA 

3 5 sequence. 
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The primers preferably have a nucleotide sequence 
that is identical to a portion of the DNA sequence to be 
amplified or its complement. However, a primer having 
two nucleotides that differ from the target DNA sequence 
5 or its complement also can be used. Any nucleotides 

that are not identical to the sequence or its complement 
are preferably not located at the 3 1 end of the primer. 
The 3 1 end of the primer preferably has at least two, 
preferably three or more, nucleotides that are 

10 complementary to the sequence to which the primer binds. 

Any nucleotides that are not identical to the sequence 
to be amplified or its complement will preferably not be 
adjacent in the primer sequence. More preferably, 
noncomplementary nucleotides in the primer sequence will 

15 be separated by at least three, more preferably at least 

five, nucleotides. The primers should have a melting 
temperature (TJ from about 55 to 75 °C. Preferably the 
T m is from about 60 °C to about 65 °C to facilitate 
stringent amplification conditions. 

20 The primers can be prepared using a number of 

methods, such as, for example, the phosphotriester and 
phosphodiester methods or automated embodiments thereof. 
The phosphodiester and phosphotriester methods are 
described in Cruthers, Science 230:281-285 (1985); Brown 

25 et al, Meth. Enzymol. , 68:109 (1979) ; and Nrang et al, 

Meth. Enzymol. , 68:90 (1979). In one automated method, 
diethylphosphoramidites which can be synthesized as 
described by Beaucage et al, Tetrahedron letters, 
22:1859-1962 (1981) are used as starting materials. A 

3 0 method for synthesizing primer oligonucleotide sequences 

on a modified solid support is described in U.S. Pat. 
No. 4,458,066. Each of the above references is 
incorporated herein by reference in its entirety. 

Exemplary primer sequences for analysis of Class I 

35 and Class II HLA loci; bovine leukocyte antigens, and 

cystic fibrosis are described herein. 
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Amplification 
The locus-specific primers are used in an 
amplification process to produce a sufficient amount of 
5 DNA for the analysis method. For production of RFLP 

fragment patterns or PDLP patterns which are analyzed by 
electrophoresis, about 1 to about 500 ng of DNA is 
required. A preferred amplification method is the 
polymerase chain reaction (PCR) . PCR amplification 

10 methods are described in U.S. Patent No. 4,683,195 (to 

Mullis et al, issued July 28, 1987); U.S Patent No. 
4,683,194 (to Saiki et al, issued July 28, 1987); Saiki 
et al, Science, 230:1350-1354 (1985); Scharf et al, 
Science, 324:163-166 (1986); Kogan et al, New Engl. J. 

15 Med, 317:985-990 (1987) and Saiki, Gyllensten and 

Erlich, The Polymerase Chain Reaction in Genome 
Analysis: A Practical Approach, ed. Davies pp. 141-152, 
(1988) I.R.L. Press, Oxford. Each of the above 
references is incorporated herein by reference in its 

20 entirety. 

Prior to amplification, a sample of the individual 
organism's DNA is obtained. All nucleated cells contain 
genomic DNA and, therefore, are potential sources of the 
required DNA. For higher animals, peripheral blood 

25 cells are typically used rather than tissue samples. As 

little as 0.01 to 0.05 cc of peripheral blood provides 
sufficient DNA for amplification. Hair, semen and 
tissue can also be used as samples. In the case of 
fetal analyses, placental cells or fetal cells present 

30 in amniotic fluid can be used. The DNA is isolated from 

nucleated cells under conditions that minimize DNA 
degradation. Typically, the isolation involves 
digesting the cells with a protease that does not attack 
DNA at a temperature and pH that reduces the likelihood 

35 of DNase activity. For peripheral blood cells, lysing 
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the cells with a hypotonic solution (water) is 
sufficient to release the DNA. 

DNA isolation from nucleated cells is described by 
Kan et al, N. Engl. J. Med. 297:1080-1084 (1977); Kan et 
5 al, Nature 251:392-392 (1974); and Kan et al, PNAS 

75:5631-5635 (1978). Each of the above references is 
incorporated herein by reference in its entirety. 
Extraction procedures for samples such as blood, semen, 
hair follicles, semen, mucous membrane epithelium and 
10 other sources of genomic DNA are well known. For plant 

cells, digestion of the cells with cellulase releases 
DNA. Thereafter DNA is purified as described above. 

The extracted DNA can be purified by dialysis, 
chromatography, or other known methods for purifying 
15 polynucleotides prior to amplification. Typically, the 

DNA is not purified prior to amplification. 

The amplified DNA sequence is produced by using 
the portion of the DNA and its complement bounded by the 
primer pair as a template. As a first step in the 

2 0 method, the DNA strands are separated into single 

stranded DNA. This strand separation can be 
accomplished by a number of methods including physical 
or chemical means. A preferred method is the physical 
method of separating the strands by heating the DNA 

25 until it is substantially (approximately 93%) denatured. 

Heat denaturation involves temperatures ranging from 
about 80° to 105 °C for times ranging from about 15 to 3 0 
seconds. Typically, heating the DNA to a temperature of 
from 90° to 93 °C for about 3 0 seconds to about 1 minute 

30 is sufficient. 

The primer extension product (s) produced are 
complementary to the primer-defined region of the DNA 
and hybridize therewith to form a duplex of equal length 
strands. The duplexes of the extension products and 

3 5 their templates are then separated into single-stranded 

DNA. When the complementary strands of the duplexes are 

169.0018 

26 




separated, the strands are ready to be used as a 
template for the next cycle of synthesis of additional 
DNA strands. 

Each of the synthesis steps can be performed using 
5 conditions suitable for DNA amplification. Generally, 

the amplification step is performed in a buffered 
aqueous solution, preferably at a pH of about 7 to about 
9, more preferably about pH 8. A suitable amplification 
buffer contains Tris-HCl as a buffering agent in the 

10 range of about 10 to 100 mM. The buffer also includes a 

monovalent salt, preferably at a concentration of at 
least about 10 mM and not greater than about 60 mM. 
Preferred monovalent salts are KC1, NaCl and (NH 4 ) 2 S0 4 . 
The buffer also contains MgCl 2 at about 5 to 50 mM. 

15 Other buffering systems such as hepes or glycine-NaOH 

and potassium phosphate buffers can be used. Typically, 
the total volume of the amplification reaction mixture 
is about 50 to 100 /xl. 

Preferably, for genomic DNA, a molar excess of 

20 about 10 6 :1 primer : template of the primer pair is added 

to the buffer containing the separated DNA template 
strands. A large molar excess of the primers improves 
the efficiency of the amplification process. In 
general, about 100 to 150 ng of each primer is added. 

25 The deoxyribonucleotide triphosphates dATP, dCTP, 

dGTP and dTTP are also added to the amplification 
mixture in amounts sufficient to produce the amplified 
DNA sequences. Preferably, the dNTPs are present at a 
concentration of about 0.75 to about 4.0 mM, more 

30 preferably about 2.0 mM. The resulting solution is 

heated to about 90° to 93 °C for from about 30 seconds to 
about 1 minute to separate the strands of the DNA. 
After this heating period the solution is cooled to the 
amplification temperature. 

35 Following separation of the DNA strands, the 

primers are allowed to anneal to the strands. The 
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annealing temperature varies with the length and GC 
content of the primers. Those variables are reflected 
in the T m of each primer. Exemplary HLA DQA1 primers of 
this invention, described below, require temperatures of 
5 about 55 °C. The exemplary HLA Class I primers of this 

invention require slightly higher temperatures of about 
62° to about 68°C. The extension reaction step is 
performed following annealing of the primers to the 
genomic DNA. 

10 An appropriate agent for inducing or catalyzing 

the primer extension reaction is added to the 
amplification mixture either before or after the strand 
separation (denaturation) step, depending on the 
stability of the agent under the denaturation 

15 conditions. The DNA synthesis reaction is allowed to 

occur under conditions which are well known in the art. 
This synthesis reaction (primer extension) can occur at 
from room temperature up to a temperature above which 
the polymerase no longer functions efficiently. 

20 Elevating the amplification temperature enhances the 

stringency of the reaction. As stated previously, 
stringent conditions are necessary to ensure that the 
amplified sequence and the DNA template sequence contain 
the same nucleotide sequence, since substitution of 

25 nucleotides can alter the restriction sites or probe 

binding sites in the amplified sequence. 

The inducing agent may be any compound or system 
which facilitates synthesis of primer extension 
products, preferably enzymes. Suitable enzymes for this 

3 0 purpose include DNA polymerases (such as, for example, 

E. coli DNA polymerase I, Klenow fragment of E. coli DNA 
polymerase I, T4 DNA polymerase) , reverse transcriptase, 
and other enzymes (including heat-stable polymerases) 
which facilitate combination of the nucleotides in the 

3 5 proper manner to form the primer extension products. 

Most preferred is Taq polymerase or other heat-stable 
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polymerases which facilitate DNA synthesis at elevated 
temperatures (about 60° to 90 °C) . Taq polymerase is 
described, e.g., by Chien et al, J". Bacteriol. , 
127:1550-1557 (1976). That article is incorporated 
herein by reference in its entirety. When the extension 
step is performed at about 72 °C, about 1 minute is 
required for every 1000 bases of target DNA to be 
amplified. 

The synthesis of the amplified sequence is 
initiated at the 3 1 end of each primer and proceeds 
toward the 5' end of the template along the template DNA 
strand, until synthesis terminates, producing DNA 
sequences of different lengths. The newly synthesized 
strand and its complementary strand form a double- 
stranded molecule which is used in the succeeding steps 
of the process. In the next step, the strands of the 
double-stranded molecule are separated (denatured) as 
described above to provide single-stranded molecules. 

New DNA is synthesized on the single-stranded 
template molecules. Additional polymerase, nucleotides 
and primers can be added if necessary for the reaction 
to proceed under the conditions described above. After 
this step, half of the extension product consists of the 
amplified sequence bounded by the two primers. The 
steps of strand separation and extension product 
synthesis can be repeated as many times as needed to 
produce the desired quantity of the amplified DNA 
sequence. The amount of the amplified sequence produced 
accumulates exponentially. Typically, about 2 5 to 3 0 
cycles are sufficient to produce a suitable amount of 
the amplified DNA sequence for analysis. 

The amplification method can be performed in a 
step-wise fashion where after each step new reagents are 
added, or simultaneously, where all reagents are added 
at the initial step, or partially step-wise and 
partially simultaneously, where fresh reagent is added 
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after a given number of steps* The amplification 
reaction mixture can contain, in addition to the sample 
genomic DNA, the four nucleotides, the primer pair in 
molar excess, and the inducing agent, e.g., Taq 
5 polymerase. 

Each step of the process occurs sequentially 
notwithstanding the initial presence of all the 
reagents. Additional materials may be added as 
necessary. Typically, the polymerase is not replenished 

10 when using a heat-stable polymerase. After the 

appropriate number of cycles to produce the desired 
amount of the amplified sequence, the reaction may be 
halted by inactivating the enzymes, separating the 
components of the reaction or stopping the thermal 

15 cycling. 

In a preferred embodiment of the method, the 
amplification includes the use of a second primer pair 
to perform a second amplification following the first 
amplification. The second primer pair defines a DNA 

2 0 sequence which is a portion of the first amplified 

sequence. That is, at least one of the primers of the 
second primer pair defines one end of the second 
amplified sequence which is within the ends of the first 
amplified sequence. In this way, the use of the second 
25 primer pair helps to ensure that any amplified sequence 

produced in the second amplification reaction is 
specific for the tested locus. That is, non-target 
sequences which may be copied by a locus-specific pair 
are unlikely to contain sequences that hybridize with a 

3 0 second locus-specific primer pair located within the 

first amplified sequence. 

In another embodiment, the second primer pair is 
specific for one allele of the locus. In this way, 
detection of the presence of a second amplified sequence 
3 5 indicates that the allele is present in the sample. The 

presence of a second amplified sequence can be 
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determined by quantitating the amount of DNA at the 
start and the end of the second amplification reaction. 
Methods for quantitating DNA are well known and include 
determining the optical density at 260 (OD 260 ) , and 
5 preferably additionally determining the ratio of the 

optical density at 260 to the optical density at 280 
(OD 260 /OD 280 ) to determine the amount of DNA in comparison 
to protein in the sample. 

Preferably, the first amplification will contain 

10 sufficient primer for only a limited number of primer 

extension cycles, e.g. less than 15, preferably about 10 
to 12 cycles, so that the amount of amplified sequence 
produced by the process is sufficient for the second 
amplification but does not interfere with a 

15 determination of whether amplification occurred with the 

second primer pair. Alternatively, the amplification 
reaction can be continued for additional cycles and 
aliquoted to provide appropriate amounts of DNA for one 
or more second amplification reactions. Approximately 

20 100 to 150 ng of each primer of the second primer pair 

is added to the amplification reaction mixture. The 
second set of primers is preferably added following the 
initial cycles with the first primer pair. The amount 
of the first primer pair can be limited in comparison to 

25 the second primer pair so that, following addition of 

the second pair, substantially all of the amplified 
sequences will be produced by the second pair. 

As stated previously, the DNA can be quant itated 
to determine whether an amplified sequence was produced 

30 in the second amplification. If protein in the reaction 

mixture interferes with the quantitation (usually due to 
the presence of the polymerase) , the reaction mixture 
can be purified, as by using a 100,000 MW cut off 
filter. Such filters are commercially available from 

35 Millipore and from Centricon. 
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Analysis of the Amplified DNA Sequence 
As discussed previously, the method used to 
analyze the amplified DNA sequence to characterize the 
allele (s) present in the sample DNA depends on the 
5 genetic variation in the sequence. When distinctions 

between alleles include primer-defined length 
polymorphisms, the amplified sequences are separated 
based on length, preferably using gel or capillary 
electrophoresis. When using probe hybridization for 

10 analysis, the amplified sequences are reacted with 

labeled probes. When the analysis is based on RFLP 
fragment patterns, the amplified sequences are digested 
with one or more restriction endonucleases to produce a 
digest and the resultant fragments are separated based 

15 on length, preferably using gel or capillary 

electrophoresis. When the only variation encompassed by 
the amplified sequence is a sequence variation that does 
not result in a change in length or a change in a 
restriction site and is unsuitable for detection by a 

20 probe, the amplified DNA sequences are sequenced. 

Procedures for each step of the various analytical 
methods are well known and are described below. 

Production of RFLP Fragment Patterns 

25 Restriction endonucleases 

A restriction endonuclease is an enzyme that 
cleaves or cuts DNA hydrolytically at a specific 
nucleotide sequence called a restriction site. 
Endonucleases that produce blunt end DNA fragments 

30 (hydrolysis of the phosphodiester bonds on both DNA 

strands occur at the same site) as well as endonucleases 
that produce sticky ended fragments (the hydrolysis 
sites on the strands are separated by a few nucleotides 
from each other) can be used. 

35 Restriction enzymes are available commercially 

from a number of sources including Sigma 
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Pharmaceuticals, Bethesda Research Labs, Boehringer- 
Manheim and Pharmacia. As stated previously, a 
restriction endonuclease used in the present invention 
cleaves an amplified DNA sequence of this invention to 
5 produce a digest comprising a set of fragments having 

distinctive fragment lengths. In particular, the 
fragments for one allele of a locus differ in size from 
the fragments for other alleles of the locus. The 
patterns produced by separation and visualization of the 

10 fragments of a plurality of digests are sufficient to 

distinguish each allele of the locus. More 
particularly, the endonucleases are chosen so that by 
using a plurality of digests of the amplified sequence, 
preferably fewer than five, more preferably two or three 

15 digests, the alleles of a locus can be distinguished. 

In selecting an endonuclease, the important 
consideration is the number of fragments produced for 
amplified sequences of the various alleles of a locus. 
More particularly, a sufficient number of fragments must 

20 be produced to distinguish between the alleles and, if 

required, to provide for individuality determinations. 
However, the number of fragments must not be so large or 
so similar in size that a pattern that is not 
distinguishable from those of other haplotypes by the 

25 particular detection method is produced. Preferably, 

the fragments are of distinctive sizes for each allele. 
That is, for each endonuclease digest of a particular 
amplified sequence, the fragments for an allele 
preferably differ from the fragments for every other 

3 0 allele of the locus by at least 10, preferably 20, more 

preferably 30, most preferably 50 or more nucleotides. 

One of ordinary skill can readily determine 
whether an endonuclease produces RFLP fragments having 
distinctive fragment lengths. The determination can be 

35 made experimentally by cleaving an amplified sequence 

for each allele with the designated endonuclease in the 
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invention method. The fragment patterns can then be 
analyzed. Distinguishable patterns will be readily 
recognized by determining whether comparison of two or 
more digest patterns is sufficient to demonstrate 
characteristic differences between the patterns of the 
alleles. 

The number of digests that need to be prepared for 
any particular analysis will depend on the desired 
information and the particular sample to be analyzed. 
Since HLA analyses are used for a variety of purposes 
ranging from individuality determinations for forensics 
and paternity to tissue typing for transplantation, the 
HLA complex will be used as exemplary. 

A single digest may be sufficient to determine 
that an individual cannot be the person whose blood was 
found at a crime scene. In general, however, where the 
DNA samples do not differ, the use of two to three 
digests for each of two to three HLA loci will be 
sufficient for matching applications (forensics, 
paternity) . For complete HLA typing, each locus needs 
to be determined. 

In a preferred embodiment, sample HLA DNA 
sequences are divided into aliquots containing similar 
amounts of DNA per aliquot and are amplified with primer 
pairs (or combinations of primer pairs) to produce 
amplified DNA sequences for a number of HLA loci. Each 
amplification mixture contains only primer pairs for one 
HLA locus. The amplified sequences are preferably 
processed concurrently, so that a number of digest RFLP 
fragment patterns can be produced from one sample. In 
this way, the HLA type for a number of alleles can be 
determined simultaneously. 

Alternatively, preparation of a number of RFLP 
fragment patterns provides additional comparisons of 
patterns to distinguish samples for forensic and 
paternity analyses where analysis of one locus 
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frequently fails to provide sufficient information for 
the determination when the sample DNA has the same 
allele as the DNA to which it is compared. 

5 Production of RFLP fracrments 

Following amplification, the amplified DNA 
sequence is combined with an endonuclease that cleaves 
or cuts the amplified DNA sequence hydrolytically at a 
specific restriction site. The combination of the 

10 endonuclease with the amplified DNA sequence produces a 

digest containing a set of fragments having distinctive 
fragment lengths. U.S. Patent No. 4,582,788 (to Erlich, 
issued April 15, 1986) describes an HLA typing method 
based on restriction length polymorphism (RFLP) . That 

15 patent is incorporated herein by reference in its 

entirety. 

In a preferred embodiment, two or more aliquot s of 
the amplification reaction mixture having approximately 
equal amounts of DNA per aliquot are prepared. 

2 0 Conveniently about 5 to about 10 fxl of a 100 /il reaction 

mixture is used for each aliquot. Each aliquot is 
combined with a different endonuclease to produce a 
plurality of digests. In this way, by using a number of 
endonucleases for a particular amplified DNA sequence, 
25 locus-specific combinations of endonucleases that 

distinguish a plurality of alleles of a particular locus 
can be readily determined. Following preparation of the 
digests, each of the digests can be used to form RFLP 
patterns. Preferably, two or more digests can be pooled 

3 0 prior to pattern formation. 

Alternatively, two or more restriction 
endonucleases can be used to produce a single digest. 
The digest differs from one where each enzyme is used 
separately and the resultant fragments are pooled since 
3 5 fragments produced by one enzyme may include one or more 

restriction sites recognized by another enzyme in the 

r> 
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digest. Patterns produced by simultaneous digestion by 
two or more enzymes will include more fragments than 
pooled products of separate digestions using those 
enzymes and will be more complex to analyze. 
5 Furthermore, one or more restriction endonucleases 

can be used to digest two or more amplified DNA 
sequences. That is, for more complete resolution of all 
the alleles of a locus, it may be desirable to produce 
amplified DNA sequences encompassing two different 

10 regions. The amplified DNA sequences can be combined 

and digested with at least one restriction endonuclease 
to produce RFLP patterns. 

The digestion of the amplified DNA sequence with 
the endonuclease can be carried out in an aqueous 

15 solution under conditions favoring endonuclease 

activity. Typically the solution is buffered to a pH of 
about 6.5 to 8.0. Mild temperatures, preferably about 
20 °C to about 45 °C, more preferably physiological 
temperatures (25° to 40°C), are employed. Restriction 

2 0 endonucleases normally require magnesium ions and, in 

some instances, cof actors (ATP and S-adenosyl 
methionine) or other agents for their activity. 
Therefore, a source of such ions, for instance inorganic 
magnesium salts, and other agents, when required, are 
25 present in the digestion mixture. Suitable conditions 

are described by the manufacturer of the endonuclease 
and generally vary as to whether the endonuclease 
requires high, medium or low salt conditions for optimal 
activity. 

3 0 The amount of DNA in the digestion mixture is 

typically in the range of 1% to 20% by weight. In most 
instances 5 to 20 /ig of total DNA digested to completion 
provides an adequate sample for production of RFLP 
fragments. Excess endonuclease, preferably one to five 
3 5 units/^g DNA, is used. 
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The set of fragments in the digest is preferably 
further processed to produce RFLP patterns which are 
analyzed. If desired, the digest can be purified by 
precipitation and resuspension as described by Kan et 
5 al, PNAS 75:5631-5635 (1978), prior to additional 

processing. That article is incorporated herein by 
reference in its entirety. 

Once produced, the fragments are analyzed by well 
known methods. Preferably, the fragments are analyzed 
10 using electrophoresis. Gel electrophoresis methods are 

described in detail hereinafter. Capillary 
electrophoresis methods can be automated (as by using 
Model 2 07A analytical capillary electrophoresis system 
from Applied Biosystems of Foster City, CA) and are 
15 described in Chin et al, American Biotechnology 

Laboratory News Edition, December, 1989. 

Electrophoretic Separation of DNA Fragments 
Electrophoresis is the separation of DNA sequence 

2 0 fragments contained in a supporting medium by size and 

charge under the influence of an applied electric field. 
Gel sheets or slabs, e.g. agarose, agarose-acrylamide or 
polyacrylamide, are typically used for nucleotide sizing 
gels. The electrophoresis conditions affect the desired 
25 degree of resolution of the fragments. A degree of 

resolution that separates fragments that differ in size 
from one another by as little as 10 nucleotides is 
usually sufficient. Preferably, the gels will be 
capable of resolving fragments which differ by 3 to 5 

3 0 nucleotides. However, for some purposes (where the 

differences in sequence length are large) , 
discrimination of sequence differences of at least 
100 nt may be sufficiently sensitive for the analysis. 
Preparation and staining of analytical gels is 
35 well known. For example, a 3% Nusieve 1% agarose gel 

which is stained using ethidium bromide is described in 
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# # 

Boerwinkle et al , PNAS , 86:212-216 (1989). Detection of 
DNA in polyacrylamide gels using silver stain is 
described in Goldman et al, Electrophoresis, 3:24-2 6 
(1982); Marshall, Electrophoresis, 4:269-272 (1983); 
5 Tegelstrom, Electrophoresis, 7:226-229 (1987); and Allen 

et al, BioTechniques 7:736-744 (1989). The method 
described by Allen et al, using large-pore size 
ultrathin-layer, rehydratable polyacrylamide gels 
stained with silver is preferred. Each of those 
10 articles is incorporated herein by reference in its 

entirety. 

Size markers can be run on the same gel to permit 
estimation of the size of the restriction fragments. 
Comparison to one or more control sample (s) can be made 

15 in addition to or in place of the use of size markers. 

The size markers or control samples are usually run in 
one or both the lanes at the edge of the gel, and 
preferably, also in at least one central lane. In 
carrying out the electrophoresis, the DNA fragments are 

20 loaded onto one end of the gel slab (commonly called the 

"origin") and the fragments separate by electrically 
facilitated transport through the gel, with the shortest 
fragment electrophoresing from the origin towards the 
other (anode) end of the slab at the fastest rate. An 

25 agarose slab gel is typically electrophoresed using 

about 100 volts for 30 to 45 minutes. A polyacrylamide 
slab gel is typically electrophoresed using about 200 to 
1,200 volts for 45 to 60 minutes. 

After electrophoresis, the gel is readied for 

30 visualization. The DNA fragments can be visualized by 

staining the gel with a nucleic acid-specific stain such 
as ethidium bromide or, preferably, with silver stain, 
which is not specific for DNA. Ethidium bromide 
staining is described in Boerwinkle et al, supra. 

3 5 Silver staining is described in Goldman et al, supra, 
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Marshall, supra, Tegelstrom, supra, and Allen et al, 
supra. 

Probes 

5 Allele-specif ic oligonucleotides or probes are 

used to identify DNA sequences which have regions that 
hybridize with the probe sequence. The amplified DNA 
sequences defined by a locus-specific primer pair can be 
used as probes in RFLP analyses using genomic DNA. U.S. 

10 Patent No. 4,582,788 (to Erlich, issued April 15, 198 6) 

describes an exemplary HLA typing method based on 
analysis of RFLP patterns produced by genomic DNA. The 
analysis uses cDNA probes to analyze separated DNA 
fragments in a Southern blot type of analysis. As 

15 stated in the patent 11 [C] omplementary DNA probes that 

are specific to one (locus-specific) or more 
(multilocus) particular HLA DNA sequences involved in 
the polymorphism are essential components of the 
hybridization step of the typing method" (col. 6, 

20 1. 3-7). 

The amplified DNA sequences of the present method 
can be used as probes in the method described in that 
patent or in the present method to detect the presence 
of an amplified DNA sequence of a particular allele. 

25 More specifically, an amplified DNA sequence having a 

known allele can be produced and used as a probe to 
detect the presence of the allele in sample DNA which is 
amplified by the present method. 

Preferably, however, when a probe is used to 

3 0 distinguish alleles in the amplified DNA sequences of 

the present invention, the probe has a relatively short 
sequence (in comparison to the length of the amplified 
DNA sequence) which minimizes the sequence homology of 
other alleles of the locus with the probe sequence. 

35 That is, the probes will correspond to a region of the 

amplified DNA sequence which has the largest number of 
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nucleotide differences from the amplified DNA sequences 
of other alleles produced using that primer pair. 

The probes can be labelled with a detectable atom, 
radical or ligand using known labeling techniques. 
5 Radiolabels, usually 32 P, are typically used. The 

probes can be labeled with 32 P by nick translation with 
an a- 32 P-dNTP (Rigby et al, J. Mol. Biol., 113:237 
(1977)) or other available procedures to make the locus- 
specific probes for use in the methods described in the 

10 patent. The probes are preferably labeled with an 

enzyme, such as hydrogen peroxidase. Coupling enzyme 
labels to nucleotide sequences are well known. Each of 
the above references is incorporated herein by reference 
in its entirety. 

15 The analysis method known as "Southern blotting" 

that is described by Southern, J. Mol. Biol., 98:503-517 
(1975) is an analysis method that relies on the use of 
probes. In Southern blotting the DNA fragments are 
electrophoresed, transferred and affixed to a support 

2 0 that binds nucleic acid, and hybridized with an 

appropriately labeled cDNA probe. Labeled hybrids are 
detected by autoradiography, or preferably, use of 
enzyme labels. 

Reagents and conditions for blotting are described 

25 by Southern, supra; Wahl et al, PNAS 6:3683-3687 (1979); 

Kan et al, PNAS, supra, U.S. Pat. No. 4:302,204 and 
Molecular Cloning: A Laboratory Manual by Maniatis et 
al, Cold Spring Harbor Laboratory 1982. After the 
transfer is complete the paper is separated from the gel 

30 and is dried. Hybridization (annealing) of the resolved 

single stranded DNA on the paper to an probe is effected 
by incubating the paper with the probe under hybridizing 
conditions. See Southern, supra; Kan et al, PNAS, supra 
and U.S. Pat. No. 4,3 02,204, col 5, line 8 et seq. 

3 5 Complementary DNA probes specific for one allele, one 

locus (locus-specific) or more are essential components 
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of the hybridization step of the typing method. Locus- 
specific probes can be made by the amplification method 
for locus-specific amplified sequences, described above. 
The probes are made detectable by labeling as described 
5 above . 

The final step in the Southern blotting method is 
identifying labeled hybrids on the paper (or gel in the 
solution hybridization embodiment) . Autoradiography can 
be used to detect radiolabel-containing hybrids . Enzyme 

10 labels are detected by use of a color development system 

specific for the enzyme. In general, the enzyme cleaves 
a substrate, which cleavage either causes the substrate 
to develop or change color. The color can be visually 
perceptible in natural light or a fluorochrome which is 

15 excited by a known wavelength of light. 

Sequencing 

Genetic variations in amplified DNA sequences 
which reflect allelic difference in the sample DNA can 

2 0 also be detected by sequencing the amplified DNA 

sequences. Methods for sequencing oligonucleotide 
sequences are well known and are described in, for 
example, Molecular Cloning: A Laboratory Manual by 
Maniatis et al, Cold Spring Harbor Laboratory 1982. 
25 Currently, sequencing can be automated using a number 

of commercially available instruments. 

Due to the amount of time currently required to 
obtain sequencing information, other analysis methods, 
such as gel electrophoresis of the amplified DNA 

3 0 sequences or a restriction endonuclease digest thereof 

are preferred for clinical analyses. 

Kits 

As stated previously, the kits of this invention 

3 5 comprise one or more of the reagents used in the above 

described methods. In one embodiment, a kit comprises 
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at least one genetic locus-specific primer pair in a 
suitable container. Preferably the kit contains two or 
more locus-specific primer pairs. In one embodiment, 
the primer pairs are for different loci and are in 
5 separate containers. In another embodiment, the primer 

pairs are specific for the same locus. In that 
embodiment, the primer pairs will preferably be in the 
same container when specific for different alleles of 
the same genetic locus and in different containers when 

10 specific for different portions of the same allele 

sequence. Sets of primer pairs which are used 
sequentially can be provided in separate containers in 
one kit. The primers of each pair can be in separate 
containers, particularly when one primer is used in each 

15 set of primer pairs. However, each pair is preferably 

provided at a concentration which facilitates use of the 
primers at the concentrations required for all 
amplifications in which it will be used. 

The primers can be provided in a small volume 

20 (e.g. 100 /xl) of a suitable solution such as sterile 

water or Tris buffer and can be frozen. Alternatively, 
the primers can be air dried. 

In another embodiment, a kit comprises, in 
separate containers, two or more endonucleases useful in 

2 5 the methods of this invention. The kit will preferably 

contain a locus-specific combination of endonucleases. 
The endonucleases can be provided in a suitable solution 
such as normal saline or physiologic buffer with 50% 
glycerol (at about -20 °C) to maintain enzymatic 

3 0 activity. 

The kit can contain one or more locus-specific 
primer pairs together with locus-specific combinations 
of endonucleases and may additionally include a control. 
The control can be an amplified DNA sequence defined by 
35 a locus-specific primer pair or DNA having a known HLA 

type for a locus of interest. 
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Additional reagents such as amplification buffer, 
digestion buffer, a DNA polymerase and nucleotide 
triphosphates can be provided separately or in the kit. 
The kit may additionally contain gel preparation and 
5 staining reagents or preformed gels. 

Analyses of exemplary genetic loci are described 

below. 

10 Analysis of HLA Type 

The present method of analysis of genetic 
variation in an amplified DNA sequence to determine 
allelic difference in sample DNA can be used to 
determine HLA type. Primer pairs that specifically 

15 amplify genomic DNA associated with one HLA locus are 

described in detail hereinafter. In a preferred 
embodiment, the primers define a DNA sequence that 
contains all exons that encode allelic variability 
associated with the HLA locus together with at least a 

2 0 portion of one of the adjacent intron sequences. For 

Class I loci, the variable exons are the second and 
third exons. For Class II loci, the variable exon is 
the second exon. The primers are preferably located so 
that a substantial portion of the amplified sequence 

2 5 corresponds to intron sequences. 

The intron sequences provide restriction sites 
that, in comparison to cDNA sequences, provide 
additional information about the individual; e.g., the 
haplotype. Inclusion of exons within the amplified DNA 

3 0 sequences does not provide as many genetic variations 

that enable distinction between alleles as an intron 
sequence of the same length, particularly for constant 
exons. This additional intron sequence information is 
particularly valuable in paternity determinations and in 
35 forensic applications. It is also valuable in typing 

for transplant matching in that the variable lengths of 
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intron sequences included in the amplified sequence 
produced by the primers enables a distinction to be made 
between certain heterozygotes (two different alleles) 
and homozygotes (two copies of one allele) . 

Allelic differences in the DNA sequences of HLA 
loci are illustrated below. The tables illustrate the 
sequence homology of various alleles and indicate 
exemplary primer binding sites. Table 1 is an 
illustration of the alignment of the nucleotides of the 
Class I A2 , A3, Ax, A24 (formerly referred to as A9) , 
B27 , B58 (formerly referred to as B17) , CI, C2 and C3 
allele sequences in intervening sequence (IVS) I and 
III. (The gene sequences and their numbering that are 
used in the tables and throughout the specification can 
be found in the Genbank and/or European Molecular 
Biology Laboratories (EMBL) sequence databanks. Those 
sequences are incorporated herein by reference in their 
entirety.) Underlined nucleotides represent the regions 
of the sequence to which exemplary locus-specific or 
Class I-specific primers bind. 

Table 2 illustrates the alignment of the 
nucleotides in IVS I and II of the DQA3 (now DQA1 0301), 
DQA1.2 (now DQA1 0102) and DQA4 . 1 (now DQA1 0501) 
alleles of the DQA1 locus (formerly referred to as the 
DR4, DR6 and DR3 alleles of the DQA1 locus, 
respectively) . Underlined nucleotides represent the 
regions of the sequence to which exemplary DQA1 locus- 
specific primers bind. 

Table 3 illustrates the alignment of the 
nucleotides in IVS I, exon 2 and IVS II of two 
individuals having the DQwl v allele (designated 
hereinafter as DQwl v a and DQwl v b for the upper and lower 
sequences in the table, respectively) , the DQw2 and DQw8 
alleles of the DQB1 locus. Nucleotides indicated in the 
DQwl v b, DQw2 and DQw8 allele sequences are those which 
differ from the DQwl v a sequence. Exon 2 begins and ends 
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at nt 599 and nt 870 of the DQwl v a allele sequence, 
respectively. Underlined nucleotides represent the 
regions of the sequence to which exemplary DQB1 locus- 
specific primers bind. 
5 Table 4 illustrates the alignment of the 

nucleotides in IVS I, exon 2 and IVS II of the DPB4.1, 
DPB9 , New and DPw3 alleles of the DPB1 locus. 
Nucleotides indicated in the DPB9, New and DPw3 allele 
sequences are those which differ from the DPB4 . 1 
10 sequence. Exon 2 begins and ends at nt 7644 and nt 7907 

of the DPB4 . 1 allele sequence, respectively. Underlined 
nucleotides represent the regions of the sequence to 
which exemplary DPB1 locus-specific primers bind. 
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TABLE 1 



15 



25 



Class I Seq 


CI 


1 


GATTACCAATATTGTGOGACCTACrGTATCAATAAAC 


C2 


1 


T 


CI 


38 


AAAMGGAMCTGGTCTCTATGAGAATCTCTACCra 


C2 


38 


G G 


CI 


88 


CACTTCACCAGjTTTAMGAGAAMCTCCTGACTCT 


C2 


88 




B27 


1 


GAGCTCACTCTCTGGCATCAAGTrC TCCGTG 


CI 


138 


AGGGCGAGCTCACTGTCTGGCAGCMGTTOCCCAT 


C2 


138 


T 


A2 


1 


MGCTTACTCTCTGGCACCAAAC TCCATGGGATGATTTTTCCTTCC TAG 


B27 


32 


ATCAGTTTCCCT . 


CI 


188 


TACAAGAGTCCAAGGGGAGAGGTAAGTGTGCTTT AT TTTGCTGGATGTAG 


C2 


187 




A2 


50 


MGAGTCCAGGTQGACAGGTAA GGAGTGGGAGT CAGGGAGTC 


B27 


44 


ACACAAGA TGCAAGAGGAGAGGTAA GGAGT GAG AGQCAGGGAGTC 


CI 


238 


TTTAATATTAGCT GAGGTAAGGTAA . GGC AAAGAGTGGG AGGCAGGGAGTC 


C2 


237 


C - G 


A2 


98 


CAGTTCCAGGGACAGAGATTACGGGATAAAAAGTGAMGGAGAGGGACG GGGCCCAT 


B27 


91 


CAGTT CAGG3ACAGGGATTCCAGGAGGAGAAGTGAAGGGGAAGC GGG TGGGC 


CI 


288 


CAGTT CAGGGACGGGGATTGCAGGAGAAG TGAAGGGGAAG GGGCTGGGGG 


C2 


288 




A2 


149 


GCCGAG GGTTTCTCCCTTGTTTCT CAGACAGCTC TTGGGCCA A GAC 


B27 


141 


GCCACHGQGQGTCTCTC(XTGCT GGAC 


CI 


338 


CAGCC TGGGGGTCTCTCCCTGGTTTXACAGACAGATCXr^ GCC AGGAC 


C2 


337 


GG 


A2 


195 


TCAGGGAGACATTGAGACAGAGC GCTTGGCACAGAAGCAGAGGGGTCAGGG 


B27 


191 


TCAGGCAGACAGTGTGACAAAGAGGCT GGTGTAGGAGAAGAGGGATCAGG 


CI 


388 


TCAGGCACACAGTGTGACAMGATGCrTG3TGTAGGAGMGAGSGATCAG 


C2 


387 


G 


A2 


246 


CGAA GTCCAGGGCCCCAGGCGTTGGCTCTCA 


A3 


1 




Ax 


1 




A24 


1 




B27 


241 


ACGMCGTGCAAGGCCC03GGCG CGG TCHCAGGGTCTCAGGCTCCGAGAG 


CI 


438 


ACGAA GTCCCAGGTCCCGGGCG GGGTTCTCAGGGTCTCAGGCTOCAAGGG 


C2 


438 


-A 



35 
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CGGTGTATGGATTGGGGAGTC^ AGTT 
T A 

TG G C 

- T 

CCTTGTCTGCATTGGGGAGGCGCACAGTTGGG3 TIXXOCACTCOZACGAGTr 
QDGTGTCTGCACTGGGGAGGCGO0GO3TTGAG3A 



TCTTTTCTCCC TCTOCCMCCTATCTAGGGTXCTTCTTCCTGGAT ACTCAC 
CTG C A G 

10 Ax 61 C A GC AC C 

TG- 

TCACTTCT TCIXrCMOCTATGTCGGGTOCTTCTTGCAGGAT ACTCGT 
G TTCACTTCTTCTCCCMCCTGCGTOGGGT ACTCAT 
T A 

T G G 



15 



25 



30 



35 



A2 


296 


A3 


Q 


Ax 


q 


A24 


11 


B27 


291 


CI 


488 


C2 


488 


A2 


348 


A3 


60 


Ax 


61 


A24 


61 


B27 


344 


CI 


538 


C2 


538 


C3 


1 


A2 


399 


A3 


114 


Ax 


109 


A24 111 


B27 


392 


B58 


1 


CI 


588 


C2 


589 


C3 


36 


A2 


449 


A3 


164 


Ax 


159 


A24 


161 


B27 442 


B58 


12 


CI 


635 


C2 


637 


C3 


87 


A2 


489 


A3 


204 


Ax 


i99 


A24 


201 


B27 482 


B58 


52 


CI 


675 


C2 


677 


C3 


127 



GACGCGGAOXAGTTCTCACTCCCA^ AGAGAAG C 

A A T C A - T 

G 

GACGCGTCCCCATTTC CACTCCCATTGGGTGTCGGGT GTCTAGAGAAG C 

GACGCGTCCCCMTTCCCACT03CATTGGG TCT AGAAG C 

AG 

20 C3 36 -ACCNN G 

CAATCAGTGTCGTCGCGGTCXXX33TTCTAM CCGCACG 

T C 
G C C C C 

A T 
CAATCAGTGTCGCOGGGGTCCCAGTTCrAAAGT CCCCACG 



CAATCA GCGTCTgJGCAGTCCXGGITCTAMGTC CAGT 
C 

GG G 

CACCCACCQGGACTCAGA TTCTCCCCAGAOGXGAGGATQG C C 

: TCGTGGAGACCAGGC 
T " G 

CACCCACCCGGACTCAGA ATCTCCTCAGACGCCGAG ATQCG C 



CACCCACCCGGACTCAGA TTCTCCCCAGAOGOCGAG ATGCG 
G 
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1st EXDN 

A2 532 GTCATGGOXCCCGMCCCTCG^ 

A3 262 C C 

Ax 242 C C G A C 

A24 244 G C 

B27 524 GTCACQGOXCODGMCCCI^ 

B58 94 G 

CI 717 GTCATGGCQXCCGMCGCT CATCCTGCIXSCTC^ 

C2 719 

C3 169 G 

A2 574 TGGCCCTGAGCCAGACCTGGGCG G 
A3 305 

Ax 285 C 
A24 287 A 
B27 567 TGGCCCTGACCGAGAOCTGGGCTG 
B58 137 C 
CI 760 TG3CCCTGACCGAGA0CTGGGXT 
C2 762 

C3 212 G 
I7S1 

A2 599 GTGAGTGCGGGGTCGGG AGGGAAACG GCC TCTGT GGGGAGAAGCAACGGGCC G 
A3 329 C AC C G T 

Ax 309 A T C T-G — G NG G CG 

A24 311 TCG C C . G CG 

B27 591 GTGAGTGCGGGGTCAGGCAGGGAAATG GCC TCTGT GGGGAGGAGCGAGGGGA CG 
B58 161 G - C 

CI 784 GTGAGTGCGGGGTTGGG AGGGAAACG GCC TCT GCGGAGAGGMCGAGGTGOCCG 
C2 786 G G 

C3 236 T T G G 

A2 652 CCTGGC GGGGGCGCAGGAC03GGGMGCCGCGCCGGGA QGAG ' 

A3 383 G G C 

Ax 357 C G T AG A 

A24 367 A 

B27 645 CAGGC GGGGGCGCAGGAOCCGGGGAGCCGOGCCGGGAGGAGGGT^ 

B58 215 T A 

CI .838 CCCGGC AGG CGCAGGACCCGGGGAGCCGCGCAGGGAGGAGGG^ 

C2 840 G G - AGC 

C3 291 GGA G 

A2 711 CCACTCCTCGTCCCCAG 

A3 4 42 ~ G -C 

Ax 417 TC CT 
A24 426 

B27 703 CCCCTariX23GCCCCAG 
B58 273 

C2 898 T " •• 

C3 351 - 

169. 0018 

48 



10 



I7S3 

A2 1515 CTAQCAGGQGQCAQGGQ XG^ 

A3 1245 

Ax 1222 C ACA - 

A24 1228 G 

B27 1508 GTAOCAGGGXAGTGGGGAGOriTX 

B58 1082 

CI 1704 GTACCAQGGGCAGTGGGGAGCCTTOCCCA^ 

C2 1705 T G 

C3 1155 - T G 

A2 1574 ACMGGAGGGGAGAC^TTGGGACCAACACTAGAATATCGCCCTC 
A3 1303 C C G A T T 

Ax 1280 A A A T 

A24 1287 C 

B27 1567 ACGAGMGAGGAGGAAMTGGGATCAGCGCTAGMTGTCGCCCTCCOT 
B58 1141 

CI 1763 ACXAGGAGGGGAGGAAMTGGGATCAGCGCTAGM 
C2 1764 
15 C3 1213 

A2 1627 0CTGAGGGAGA.GGAATCCT03TGGGT TTCCAG ATCCTGTACCAGAGAGTGA 
A3 1356 T . T T T - GA G 

Ax 1333 T T 

A24 1341 T 

B27 1620 G3AGAATGGCATGAGTTTTCCTGAGITTC 
20 B58 1194 

CI 1816 GGAGMTGGGATGAGTTTTCCTGAGTTTC 
C2 1817 
C3 1266 

A2 1678 CTCTGAGGT TCCGCOCrrQCTCTCIXjA CACAATTMGGGATA AAATCTCTGAAGGA 
A3 1406 T G A A -G - 



25 



Ax 1372 G G G - 

A24 1392 C 
B27 1649 CICTGAGGGCXDCOCTCITCICTCT AGGACMTTMGGGATGACGTCTCTGAGGAA 
B58 1223 

CI 1845 CTCTGAG3GCCCCCTCTGCTCTCT AGGACAATTMGGGATGMGTCCTTGAGGAA 
C2 1846 

C3 1295 G A 

30 A2 1733 ATGACGGG MGACGATCCCICGMTACTGATGAGTGGTTCCCTTTG^ 

A3 1460 G T ' T G T G G 

Ax 1426 ATGAA GAG 
A24 1447 A C 

B27 1704 atggaggggmgacagtccctagmtactcatcaggggto:cct 

B58 1278 

CI 1900 ATGGAGGGGMGACAGTCCCTGGMTACIGATCAGGGGTO^ 

C2 1901 

C3 1351 A - . 



35 
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A2 1783 ACACAGGCAGCAGCCTTGG3 COCG TGACTTTT OCTCTCAQGariTGTTCT CTGC 

A3 1510 C GA G 

Ax 1477 T C 

A24 1497 C A 

5 B27 1755 CTGCAGCAGOCTTGGGAAOCG TGACTTTTOCTCTCAGGOCTTGTTCACAGC 

B58 1329 T T 

CI 1951 CTTTGAQCACIGCAGCAGCTGT CTCTCAG302TTGTTCTCTGC 
C2 1952 

•C3 1411 

A2 1837 TTCACACTCAATCTGTGTGGG03TCTGAG 
10 A3 1560 C 

Ax 1528 C C 

A24 1547 C 
B27 1806 CTCACACTCAGTGTCTTTQGGGCTCTGATTCC^ 
B58 1380 

CI 2013 CTCACGTTCAATGTGTTTGAAGjTT^ 
C2 2014 
C3 1464 C 



15 



25 



35 



A2 1891 TCCACICAGGTCAGGACCAG^ TTTCCACGGAATAG 

A3 1614 TC A 

Ax 1567 T 

A24 1600 A 



/ B27 I860 TCCACICAGATCAGGAQC AGMGTTC^ CGAACTTTCCAATGAATAG 

B58 1434 

20 CI 2067 TOCACTCAGGTCAGGACXAGMGTCGCrGTTOCTOC 

C2 2068 

C3 1518 

A2 1955 GAGATTATGCCAGGTGCCTGTCTCCAGGCTGGTG^ 

A3 1664 — 

Ax 1632 T T C T T 

A24 1650 — - A AT G . 

B27 1925 GAGATTATCCXIAGGTGCCTCCGTC^ CTT CCCCA 

B58 1499 : I 

CI 2132 GAGATTATCCCAGGIGCCTGTGTCCAGGCTGGCGTCTQG^ 

C2 2133 

C3 1583 

30 



169.0018 

50 



A2 2014 1CCCAGGTGTCCTGTCCATTCTCAAGA TAGCCACATGTGTGCTGGAGGAGTGrCOCATG 
A3 1721 G G C T 

Ax 1691 C T CA A G C T 

A24 1706 G CA T 

5 B27 1983 CCCCAGGTGTCCTGTCCATTCTC AGGCTG3TCACATGQGTQGTCCTAGGGTGTCCCATG 

B58 1557 A 

CI 2191 OCCCAGGTGTCCTGTCCATTCTC AGGATQGTCAC^TCQ3CGCTGTTGGAGTGTCGCAAG 
C2 2192 A 
C3 16/42 G 

A2 2073 ACAGATCGAAMTGOCrGMTGATCTGACTCT TCCTGACAG 2113 
10 A3 1780 GC TT C T 1820 

Ax 1750 GC TT TT C T 1791 

A 24 1765 G GCAAAA CT 1784 

B27 2042 AGAGATGCAM GCGOCTCAATTTICTGACTCT^ CAG 2083 

B58 1616 i 1656 

CI 2250 AGAGATACAMGTGTCTGAATTTTCTGA CAG 2290 

C2 2251 G 2292 

15 C3 1701 1741 



20 



25 



30 



35 
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TABLE 2 

DQAl Seq 

A3 1 GATCTCTGTGTAGAATGTCCTG TTCTGAGCCAGTCCTGA GAGGAAAGGAAGTATAATCAA 

5 A1.2 1 G A 

A4.11C G AACG 

A3 61 TTTGTTATTAACTGATGAAAGAATTAAGTGAAAGATAAACCTTAGGAAGC AGAGGGAAGT 

A1.2 61 CA T C C 

A4.1 61 G T C A 

A3 121 TAA T CTATG ACTAAG AAAGTTAAGTACTCTGATAACTCATTC ATTG CTT CT 

10 A1.2 122 A CCTAA T C C A A 

A4.1 122 A CCTAA C C A CA A 

A3 172 TTTGTTCATTTACATT ATTTAAT CAC AAGTCTATGATGTGC C AGG CT CTC AGG AAATA 
A1.2 178 A T C C A 

A4.1 178 AG T CG A 



15 



A3 230 GTGAAAATTGG CACGCGATATTCTGCCCTTGTGTAGCACACACCGTAGTGGGAAAG 

A1.2 236 A A T G TAG 

A4.1 237 A C ATT G TTA 

A3 28 6 AA GTGCACTTTTAACCGGACAACTATCAACACGAAGCGGGGAGGAAGCAGGGG 

A1.2 293 A T C T A 

A4.1 294 A C A C AT A T 

A3 339 CTGGAAATGTCCACAGACTTTGCCAAA G AC AAAG CC CATAAT AT CTG AAAGT CAG 

A1.2 347 G AA TG T 

20 A4,l 348 T G G TG G T 

A3 394 TTTCTTC CATCATTTTGTGTATTAAGGTTCTTTATTCCCCTGTTCTCTG CCTTCCT 

A1.2 403 G CT CTC 

A4.1 403 CT TCAT G C CA 

A3 450 GCTTGTCATCTTC ACTCATCAGCTGACCATGTTGCCTCTTACGGTGTAAACTTGTACCAG 

A1.2 459 C GT 

A4.1 462 *C C T . 



25 



30 



A3 510 TCTTATGGTCCCTCTGGGCAGTACAG CCATGAATTTGATGGAGA CGAGGAGTTCTAT 

A1.2 519 T C C C T C C 

A4.1 522 C C C T C C 

A3 567 GTGGACCTGGAGAGGAAGGAGACTGTCTGGCAGTTGCCTCTGTTCCGCAGATTTA 

Al.2.576 C G G GA A A G 

A4.1 579 G TGT G TC A ACA 

A3 622 GAAGATTTGACCCGCAATTTGCACTGACAAACATCGCTGTGCTAAAACATAACTTGA 

A1.2 631 G T GGG G G GC C 

A4.1 634 C 

A3 679 ACATCGTGATTAAACGCTCCAACTCT ACCGCTGCTACCAATGGTATG TGTCCACCATTCTG 

Al-2 688 A A C 

A4.1 688 GTC A A 



35 
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DQA1 seq (cont. ) 



10 



A3 74 0 CCTTTCTTTAC TGATTTATCCCTTTATACCAAGTTTCATTATTTTCTTT 

A1.2 749 C TTAA A GC CC G C 

A4.1 749 CC C A 

A3 789 CCAAGAGGTCCCCAGATC 806 

A1.2 802 819 

A4.1 798 815 



15 



20 



25 



30 



35 
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5 



TABLE 3 



10 



15 



20 



25 



30 



35 



DQB1 Seq 

1 MGCTTGTGCTCTTTCCATGAATAMTG^ 

GG T T A 



51 GTAGG TCCTTTCCMCATAGMGGGAGTGA ACCTCAACG3G ACTTQGGA G 

TT TT 
C AC C TTT TA C CA AC GTGA CA C 

AT AT C A 

101 QGTAAATCTAGGCATGGGMGGMQGTATTrTACGCAGGGAQZAAGAGAA 

C 

G 

151 Jj^CGCGTGTCAGMCGAGGCCAGG 

G A G - A T G 

A A T CG A 

201 TCCGTTGMCTCIX^GATTTATGTGGATMOT 

C G G C 

C A G T T 

251 GGAGCTTCATGAAAMTGGGATTTCATGCGAGAAOGCCCTGAT CCCTCTA 

C G A 

CA G G T 

301 AGTGCAGAQGTGCATGTAAAATCAQ3CCGACTG^ 

C AT 
CT C C 

351 CAGGCTCAGGCAGGGACAGGGCTTTOCTCCC^ 

CG A CC 
C G CC C 

401 C AGATTCCAGMGXCGCAMGMGGOQGGCAGAGCTGGGC^ 

CG CACCGG G - N N N 

G C C G G G 

451 GGGAG3AT03CAGGTCTGGAG33XA^ 

C G T T 

C A A , 
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10 



15 



20 



25 



30 



501 GTCGCGCGQXGGTTCCA^ 

T G T 

G 

551 GGGGCG3033GGCIX3GGGCC TGACTGAO3Q30CG3TGATTCC O2GCAGAG 

A GCA 

GGGGGGGGGOC 

601 GATTTCGTGTAOCAGTTTAAGGGCATGTGGTACTr^ 



651 GCGCGTGCGTCTTGTM0CAGACACATCTATMGD3AGAGGAGTA 

G G AG . A AT T 

G T A 

701 GCTTQ3ACAGCGACGIGQGQ3TC 

AT T T T 

T C 

751 (XlXjT rGCCGAGTACTQGMCAGCX^ 

CC CA AA 

CC 

801 G3CG3AGTTGGA CA CGGTGTGCAGACACMCTACG AGGTGGGGTAOGGCG 

CG G CTACTA 

A C T A CT A 

851 GGATCXTGCAGAGGAGAQGTGAG^ ACCC 

G 

CCT CC GG -TTCQCC 

CCT CC G G GCCT 

901 TTGGCCGGGACCC33AGTCTCTGTQCCGG ATGGG33CGAGGTC 

A CA GCAATTC 

A G A COG GCGAA C C 

951 TCTGAAATCTTGAGCCCAGTTCATTCCAOC 



-C - C G3 

35 GC TT -CTGC-AA 
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1001 OGGGGGTGGTGQGGXAGGTGC^TCGGAGGGGOGGGGACCTAGGGCAGAG 

COST - C T A 

5 1051 CAGGGGGACMGCAGAGTTGGCCAGGCT^ 

G T- A T G - T 

1101 CTGGT033TCGGCCTCGTOCT^ 
C 

C C C - T 

10 

1151 TATQCGTTIG^^ 

TA 

1201 CCCAGTGCCCACCCTCTIXX^ 

15 ATT G C CG G 

1251 ACCT^GCMGXCCACAGTCGCX3CATrCGCrX3CA GGAAGCTT 1292 

T CG 

G T CTA A AGC CATG AGTGGGAAGCTT 

> 

20 



25 



30 



35 
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• 



TABLE 4 



• 



DPB1 Seq 



DPB4.1 



7546 



GGGAAGATTTGGGMGAATCGTTAATAT 



DPB4.1 7574 TGAGAGAGAGAGOGAGAMGAG3ATTAGATGAGAGTGXG0CTO:GCTCATGTCn3Xa: 



DPB4.1 7634 CT0CO3XAGAGMTTAOITTTTGCAGGGACGGCAGGAATOCT 
DPB9 GGAT G GCA TT 

New GGAT G GCA TT 

DPv3 



BPB4.1 7694 CAGCGCTTCCTGGAGAGATACATCTACMCCQGGA 

DPB9 T 

New T 

DPw3 

DPB4.1 7754 GTGGGGGAGTTCCGGQCGGTGACGSAGCTQGGG^ 

DPB9 A A C 

New A A C 

DPw3 



DPB4.1 7814 CAGMGGACATCCTQSAGGAGMGCGGGCAGTGCCGGACAGGA^ 
DPB9 G G A 

New C G A 

DPw3 C G A 



DPB4.1 7874 GAGCIGQGCGGGCCCATGACCCTGCAGCG^ 

DPB9 A A G G . 

New A A G G 

25 DPw3 A A G G 



DPB4.1 7934 CCCAGGGCAGCCCCGCGGGCGCGTGCCCAG 



10 



30 



35 
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Primers for HLA loci 

Exemplary HLA locus-specific primers are listed 
below. Each of the primers hybridizes with at least 
about 15 consecutive nucleotides of the designated 
5 region of the allele sequence. The designation of an 

exemplary preferred primer together with its sequence is 
also shown. For many of the primers, the sequence is 
not identical for all of the other alleles of the locus. 
For each of the following preferred primers, additional 

10 preferred primers have sequences which correspond to the 

sequences of the homologous region of other alleles of 
the locus or to their complements. 

In one embodiment, Class I loci are amplified by 
using an A, B or C locus-specific primer together with a 

15 Class I locus-specific primer. The Class I primer 

preferably hybridizes with IVS III sequences (or their 
complements) or, more preferably, with IVS I sequences 
(or their complements) . The term "Class I-specific 
primer", as used herein, means that the primer 

20 hybridizes with an allele sequence (or its complement) 

for at least two different Class I loci and does not 
hybridize with Class II locus allele sequences under the 
conditions used. Preferably, the Class I primer 
hybridizes with at least one allele of each of the A, B 

2 5 and C loci. More preferably, the Class I primer 

hybridizes with a plurality of, most preferably all of, 
the Class I allele loci or their complements. Exemplary 
Class I locus-specific primers are also listed below. 

3 0 HLA Primers 

A locus-specific primers 
allelic location: nt 1735-1757 of A3 
designation: SGD009 . AIVS3 .R2NP 

sequence : CATGTGGCCATCTTGAGAATGGA 

35 
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allelic location: 
designation: 
sequence: 

5 allelic location: 

designation: 
sequence: 

allelic location: 
10 designation: 

sequence : 

allelic location: 
designation: 
15 sequence: 




nt 1541-1564 of A2 
SGD006.AIVS3 .R1NP 
GCCCGGGAGATCTACAGGCGATCA 

nt 1533-1553 of A2 
A2.1 

CGCCTCCCTGATCGCCTGTAG 

nt 1667-1685 of A2 
A2.2 

CCAGAGAGTGACTCTGAGG 

nt 1704-1717 of A2 
A2.3 

CACAATTAAGGGAT 



B locus-specific primers 



20 



allelic location: 
designation: 
sequence: 



nt 1108-1131 of B17 
SGD007 . BIVS3 .R1NP 
TCCCCGGCGACCTATAGGAGATGG 



25 



allelic location: 
designation: 
sequence: 

allelic location: 
designation: 
sequence: 



nt 1582-1604 of B17 
SGD010 . BIVS3 . R2NP 
CTAGGACCACCCATGTGACCAGC 

nt 500-528 of B27 
B2.1 

ATCTCCTCAGACGCCGAGATGCGTCAC 



30 allelic location: 

designation: 
sequence : 



nt 545-566 of B27 
B2.2 

CTCCTGCTGCTCTGGGGGGCAG 
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• 



allelic location: 
designation: 
sequence: 



nt 1852-1876 of B27 
B2.3 

ACTTTACCTCCACTCAGATCAGGAG 



allelic location: nt 1945-1976 of B27 
designation: B2.4 

sequence : CGTCCAGGCTGGTGTCTGGGTTCTGTGCCCCT 



allelic location: nt 2009-2031 of B27 
10 designation: B2.5 

sequence : CTGGTCACATGGGTGGTCCTAGG 



15 



allelic location: 
designation: 
sequence: 



nt 2054-2079 of B27 
B2.6 

CGCCTGAATTTTCTGACTCTTCCCAT 



20 



C locus-specific primers 



allelic location: 
designation: 
sequence: 



nt 1182-1204 of C3 
SGD008.CIVS3 •RINP 
ATCCCGGGAGATCTACAGGAGATG 



25 



allelic location: 
designation: 
sequence: 

allelic location: 
designation: 
sequence: 



nt 1665-1687 of C3 
SGD011.CIVS3 .R2NP 
AACAGCGCCCATGTGACCATCCT 

nt 499-525 of CI 
C2*l 

CTGGGGAGGCGCCGCGTTGAGGATTCT 



30 allelic location: 

designation: 
sequence: 



nt 642-674 of CI 
C2.2 

CGTCTCCGCAGTCCCGGTTCTAAAGTTCCCAGT 



169. 0018 



60 



allelic location: 
designation: 
sequence: 



0 



nt 738-755 of CI 
C2.3 

ATCCTCGTGCTCTCGGGA 



allelic location: nt 1970-1987 of CI 
designation: C2 . 4 

sequence : TGTGGTCAGGCTGCTGAC 



10 



15 



allelic location: nt 2032-2051 of CI 

designation: C2.5 

sequence : AAGGTTTGATTCCAGCTT 

allelic location: nt 2180-2217 of CI 

designation: C2.6 

sequence: CCCCTTCCCCACCCCAGGTGTTCCTGTCCATTCTTCAGGA 



20 



25 



allelic location: nt 2222-2245 of CI 
designation: C2 . 7 

sequence : CACATGGGCGCTGTTGGAGTGTCG 

Class I loci-specific primers 



allelic location: 
designation: 
sequence: 

allelic location: 
designation: 
sequence: 



nt 599-620 of A2 
SGD005.IIVS1.IjNP 
GTGAGTGCGGGGTCGGGAGGGA 

nt 489-506 of A2 
1.1 

CACCCACCGGGACTCAGA 



30 allelic location: 

designation: 
sequence: 



nt 574-595 of A2 
1.2 

TGGCCCTGACCCAGACCTGGGC 
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allelic location: 
designation: 
sequence : 

5 allelic location: 

designation: 
sequence: 

allelic location: 
10 designation: 

sequence : 



15 

allelic location: 
designation: 
2 0 sequence: 

allelic location: 
designation: 
sequence: 

25 

allelic location: 
designation: 
sequence: 

30 allelic location: 

designation: 
sequence: 




nt 691-711 of A2 
1.3 

GAGGGTCGGGCGGGTCTCAGC 

nt 1816-1831 of A2 
1.4 

CTCTCAGGCCTTGTTC 

nt 1980-1923 of A2 
1.5 

CAGAAGTCGCTGTTCC 



nt 4 5-64 of DQA3 
DQA3 Ela 

TTGCCCTGACCACCGTGATG 

nt 444-463 of DQA3 
DQA3 Elb 

CTTCCTGCTTGTCATCTTCA 

nt 53 6-553 of DQA3 
DQA3 ElC 

CCATGAATTTGATGGAGA 

nt 7 05-72 3 of DQA3 
DQA3 Eld 

ACCGCTGCTACCAATGGTA 



D0A1 locus-specific primers 
allelic location: nt 23-41 of DQA3 



designation: 
sequence: 



SGD001.DQA1.LNP 
TTCTGAGCCAGTCCTGAGA 
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allelic location: 
designation: 
sequence: 




nt 789-806 of DQA3 
SGD003 . DQA1 . RNP 
CCAAGAGGTCCCCAGATC 



DRA locus-specif ic primers 



10 



15 



allelic location: 

designation: 
sequence : 

allelic location: 

designation: 

sequence: 



nt 49-68 of DRA HUMMHDRAM (1183 nt 
sequence, Accession No. K01171) 
DRA El 

TCATCATAGCTGTGCTGATG 

nt 98-118 of DRA HUMMHDRAM (1183 nt 
sequence, Accession No. K01171) 
DRA 5'E2 (5 1 indicates the primer is 
used as the 5' primer) 
AGAACATGTGATCATCCAGGC 



0! 



20 



25 



allelic location: 

designation: 
sequence: 



nt 319-341 of DRA HUMMHDRAM (1183 nt 
sequence, Accession No. K01171) 
DRA 3*E2 

CCAACTATACTCCGATCACCAAT 



DRB locus-specific primers 



allelic location: 

designation: 
sequence: 



nt 79-101 of DRB HUMMHDRC (1153 nt 
sequence, Accession No. K01171) 
DRB El 

TGACAGTGACACTGATGGTGCTG 



30 



allelic location: 

designation: 
sequence: 



nt 123-143 of DRB HUMMHDRC (1153 nt 
sequence, Accession No. K01171) 
DRB 5'E2 

GGGG AC AC C CG AC C ACGTTT C 
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allelic location: 

designation: 
sequence: 



nt 357-378 of DRB HUMMHDRC (1153 nt 
sequence, Accession No. K01171) 
DRB 3 1 E2 

TGCAGACACAACTACGGGGTTG 



D0B1 locus-specific primers 



10 



allelic location: 
designation: 
sequence: 

allelic location: 
designation: 
sequence: 



nt 509-532 DQB1 DQwl v a 
DQB El 

TGGCTGAGGGCAGAGACTCTCCC 

nt 628-647 of DQB1 DQwl v a 
DQB 5'E2 

TGCTACTTCACCAACGGGAC 



15 allelic location: 

designation: 
sequence: 



nt 816-834 of DQB1 DQwl v a 
DQB 3'E2 

GGTGTGCACACACAACTAC 



allelic location: 
20 designation: 
sequence: 



nt 124-152 of DQB1 DQwl v a 
DQB 5'IVSla 

AGGTATTTTACCCAGGGACCAAGAGAT 



25 



allelic location: 
designation: 
sequence: 



nt 314-34 0 of DQB1 DQwl v a 
DQB 5'IVSlb 

ATGTAAAATCAGCCCGACTGCCTCTTC 



30 



allelic location: 
designation: 
sequence: 



nt 1140-1166 of DQB1 DQwl v a 
DQB 3«IVS2 

GCCTCGTGCCTTATGCGTTTGCCTCCT 



DPB1 locus-specific primers 

allelic location: nt 6116-6136 of DPB1 4.1 
designation: DPB El 

sequence : TGAGGTTAATAAACTGGAGAA 



35 
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allelic location: 
designation: 
sequence : 



nt 7604-7624 of DPB1 4.1 
DPB 5'IVSl 

GAGAGTGGCGCCTCCGCTCAT 



allelic location: nt 7910-7929 of DPB1 4.1 
designation: DPB 3 , IVS2 

sequence : GAGTGAGGGCTTTGGGCCGG 

Primer pairs for HLA analyses 

It is well understood that for each primer pair, 
the 5 1 upstream primer hybridizes with the 5' end of the 
sequence to be amplified and the 3' downstream primer 
hybridizes with the complement of the 3 ' end of the 
sequence. The primers amplify a sequence between the 
regions of the DNA to which the primers bind and its 
complementary sequence including the regions to which 
the primers bind. Therefore, for each of the primers 
described above, whether the primer binds to the HLA- 
encoding strand or its complement depends on whether the 
primer functions as the 5 1 upstream primer or the 3 1 
downstream primer for that particular primer pair. 

In one embodiment, a Class I locus-specific primer 
pair includes a Class I locus-specific primer and an A, 
B or C locus-specific primer. Preferably, the Class I 
locus-specific primer is the 5 1 upstream primer and 
hybridizes with a portion of the complement of IVS I. 
In that case, the locus-specific primer is preferably 
the 3' downstream primer and hybridizes with IVS III. 
The primer pairs amplify a sequence of about 1.0 to 
about 1.5 Kb. 

In another embodiment, the primer pair comprises 
two locus-specific primers that amplify a DNA sequence 
that does not include the variable exon(s) . In one 
example of that embodiment, the 3 1 downstream primer and 
the 5' upstream primer are Class I locus-specific 
primers that hybridize with IVS III and its complement, 
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respectively. In that case a sequence of about 0.5 Kb 
corresponding to the intron sequence is amplified. 

Preferably, locus-specific primers for the 
particular locus, rather than for the HLA class, are 
5 used for each primer of the primer pair. Due to 

differences in the Class II gene sequences, locus- 
specific primers which are specific for only one locus 
participate in amplifying the DRB, DQA1, DQB and DPB 
loci. Therefore, for each of the preferred Class II 
10 locus primer pairs, each primer of the pair participates 

in amplifying only the designated locus and no other 
Class II loci. 

Analytical methods 

15 In one embodiment, the amplified sequence includes 

sufficient intron sequences to encompass length 
polymorphisms. The primer-defined length polymorphisms 
(PDLPs) are indicative of the HLA locus allele in the 
sample. For some HLA loci, use of a single primer pair 

20 produces primer-defined length polymorphisms that 

distinguish between some of the alleles of the locus. 
For other loci, two or more pairs of primers are used in 
separate amplifications to distinguish the alleles. For 
other loci, the amplified DNA sequence is cleaved with 

25 one or more restriction endonucleases to distinguish the 

alleles. The primer-defined length polymorphisms are 
particularly useful in screening processes. 

In anther embodiment, the invention provides an 
improved method that uses PCR amplification of a genomic 

3 0 HLA DNA sequence of one HLA locus. Following 

amplification, the amplified DNA sequence is combined 
with at least one endonuclease to produce a digest. The 
endonuclease cleaves the amplified DNA sequence to yield 
a set of fragments having distinctive fragment lengths. 

35 Usually the amplified sequence is divided, and two or 

more endonuclease digests are produced. The digests can 
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be used, either separately or combined, to produce RFLP 
patterns that can distinguish between individuals. 
Additional digests can be prepared to provide enhanced 
specificity to distinguish between even closely related 
5 individuals with the same HLA type. 

In a preferred embodiment, the presence of a 
particular allele can be verified by performing a two 
step amplification procedure in which an amplified 
sequence produced by a first primer pair is amplified by 

10 a second primer pair which binds to and defines a 

sequence within the first amplified sequence. The first 
primer pair can be specific for one or more alleles of 
the HLA locus. The second primer pair is preferably 
specific for one allele of the HLA locus, rather than a 

15 plurality of alleles. The presence of an amplified 

sequence indicates the presence of the allele, which is 
confirmed by production of characteristic RFLP patterns. 

To analyze RFLP patterns, fragments in the digest 
are separated by size and then visualized. In the case 

2 0 of typing for a particular HLA locus, the analysis is 

directed to detecting the two DNA allele sequences that 
uniquely characterize that locus in each individual. 
Usually this is performed by comparing the sample digest 
RFLP patterns to a pattern produced by a control sample 
25 of known HLA allele type. However, when the method is 

used for paternity testing or forensics, the analysis 
need not involve identifying a particular locus or loci 
but can be done by comparing single or multiple RFLP 
patterns of one individual with that of another 

3 0 individual using the same restriction endonuclease and 

primers to determine similarities and differences 
between the patterns. 

The number of digests that need to be prepared for 
any particular analysis will depend on the desired 
3 5 information and the particular sample to be analyzed. 

For example, one digest may be sufficient to determine 
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that an individual cannot be the person whose blood was 
found at a crime scene. In general, the use of two to 
three digests for each of two to three HLA loci will be 
sufficient for matching applications (forensics, 
5 paternity). For complete HLA haplotyping; e.g., for 

transplantation, additional loci may need to be 
analyzed. 

As described previously, combinations of primer 
pairs can be used in the amplification method to amplify 

10 a particular HLA DNA locus irrespective of the allele 

present in the sample. In a preferred embodiment, 
samples of HLA DNA are divided into aliquots containing 
similar amounts of DNA per aliquot and are amplified 
with primer pairs (or combinations of primer pairs) to 

15 produce amplified DNA sequences for additional HLA loci. 

Each amplification mixture contains only primer pairs 
for one HLA locus. The amplified sequences are 
preferably processed concurrently, so that a number of 
digest RFLP fragment patterns can be produced from one 

2 0 sample. In this way, the HLA type for a number of 

alleles can be determined simultaneously. 

Alternatively, preparation of a number of RFLP 
fragment patterns provides additional comparisons of 
patterns to distinguish samples for forensic and 
25 paternity analyses where analysis of one locus 

frequently fails to provide sufficient information for 
the determination when the sample DNA has the same 
allele as the DNA to which it is compared. 

The use of HLA types in paternity tests or 

3 0 transplantation testing and in disease diagnosis and 

prognosis is described in Basic & Clinical Immunology, 
3rd Ed (1980) Lange Medical Publications, pp 187-190, 
which is incorporated herein by reference in its 
entirety. HLA determinations fall into two general 
35 categories. The first involves matching of DNA from an 

individual and a sample. This category involves 

169.0018 

68 




forensic determinations and paternity testing. For 
category 1 analysis, the particular HLA type is not as 
important as whether the DNA from the individuals is 
related. The second category is in tissue typing such 
5 as for use in transplantation. In this case, rejection 

of the donated blood or tissue will depend on whether 
the recipient and the donor express the same or 
different antigens. This is in contrast to first 
category analyses where differences in the HLA DNA in 

10 either the introns or exons is determinative. 

For forensic applications, analysis of the sample 
DNA of the suspected perpetrator of the crime and DNA 
found at the crime scene are analyzed concurrently and 
compared to determine whether the DNA is from the same 

15 individual. The determination preferably includes 

analysis of at least three digests of amplified DNA of 
the DQA1 locus and preferably also of the A locus. More 
preferably, the determination also includes analysis of 
at least three digests of amplified DNA of an additional 

2 0 locus, e.g. the DPB locus. In this way, the probability 

that differences between the DNA samples can be 
discriminated is sufficient. 

For paternity testing, the analysis involves 
comparison of DNA of the child, the mother and the 
25 putative father to determine the probability that the 

child inherited the obligate haplotype DNA from the 
putative father. That is, any DNA sequence in the child 
that is not present in the mother's DNA must be 
consistent with being provided by the putative father. 

3 0 Analysis of two to three digests for the DQA1 and 

preferably also for the A locus is usually sufficient. 
More preferably, the determination also includes 
analysis of digests of an additional locus, e.g. the DPB 
locus . 

3 5 For tissue typing determinations for 

transplantation matching, analysis of three loci (HLA A, 
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B, and DR) is often sufficient. Preferably, the final 
analysis involves comparison of additional loci 
including DQ and DP. 

5 Production of RFLP fragment patterns 

The following table of exemplary fragment pattern 
lengths demonstrates distinctive patterns. For example, 
as shown in the table, BsrI cleaves A2, A3 and A9 allele 
amplified sequences defined by primers SGD005 . IIVS1 . LNP 

10 and SGD009 . AIVS3 .R2NP into sets of fragments with the 

following numbers of nucleotides (740, 691), (809, 335, 
283) and (619, 462, 256, 93), respectively. The 
fragment patterns clearly indicate which of the three A 
alleles is present. The following table illustrates a 

15 number of exemplary endonucleases that produce 

distinctive RFLP fragment patterns for exemplary A 
allele sequences. 

Table 2 illustrates the set of RFLP fragments 
produced by use of the designated endonucleases for 

2 0 analysis of three A locus alleles. For each 

endonuclease, the number of nucleotides of each of the 
fragments in a set produced by the endonuclease is 
listed. The first portion of the table illustrates RFLP 
fragment lengths using the primers designated 
25 SGD009 . AIVS3 .R2NP and SGD005 . IIVS1 . LNP which produce the 

longer of the two exemplary sequences. The second 
portion of the table illustrates RFLP fragment lengths 
using the primers designated SGD006 . AIVS3 .R1NP and 
SGD005.IIVS1.LNP which produce the shorter of the 

3 0 sequences. The third portion of the table illustrates 

the lengths of fragments of a DQA1 locus-specific 
amplified sequence defined by the primers designated 
SGD001.DQA1.LNP and SGD003 . DQA1 . RNP . 

As shown in the Table, each of the endonucleases 
3 5 produces a characteristic RFLP fragment pattern which 

can readily distinguish which of the three A alleles is 
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present in a sample. 
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TABLE 5 
RFLP FRAGMENT PATTERNS 

A - Long 



BsrI 



10 



15 



20 



25 



30 



35 



40 



A2 
A3 
A9 



740 691 



CfrlOl A2 
A3 
A9 

Drall A2 
A3 
A9 

A2 
A3 
A9 

A2 
A3 
A9 



Fokl 



Gsul 



HphI A2 
A3 
A9 

MboII A2 
A3 
A9 

PpumI A 2 
A3 
A9 



PssI 



A2 
A3 
A9 



809 



1055 



473 



786 



698 



596 427 
728 

515 



1004 



335 283 
619 462 256 93 

399 245 
399 247 
399 

251 138 
369 315 251 247 

251 80 

248 151 
225 213 151 
151 



868 



904 



638 



1040 



419 
643 419 



547 



375 
373 



36 



523 



1011 



419 373 
239 

218 163 
165 143 132 



72 



1349 
698 

695 



893 



676 503 



194 143 

295 251 



115 



51 
138 



369 364 



251 242 
251 



596 427 



295 251 138 
366 315 251 242 

251 



45 



169.0018 



72 



3 



A - Short 



Bsrl A2 691 254 
A3 345 335 283 

A9 619 256 93 

10 CfrlOl A2 

A3 
A9 

Drall A2 295 251 210 138 

15 A3 315 251 210 

A9 427 251 210 

Fokl A2 293 248 151 143 129 51 

A3 225 213 151 143 129 51 

20 A9 539 151 146 129 

Gsul A2 868 61 36 

A3 904 59 
A9 414 373 178 



25 



HphI A2 554 339 

A3 411 375 177 

A9 414 373 178 

30 MboII A2 

A3 
A9 

Ppuml A2 295 257 212 69 

35 A3 364 251 210 72 66 

A9 503 251 211 

PssI A2 295 251 219 72 

A3 315 251 207 72 66 

40 A9 427 251 208 72 



45 
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Screening Analysis for Genetic Disease 
Carriers of genetic diseases and those affected by 
the disease can be identified by use of the present 
method. Depending on the disease, the screening 
5 analysis can be used to detect the presence of one or 

more alleles associated with the disease or the presence 
of haplotypes associated with the disease. Furthermore, 
by analyzing haplotypes, the method can detect genetic 
diseases that are not associated with coding region 
10 variations but are found in regulatory or other 

untranslated regions of the genetic locus. The 
screening method is exemplified below by analysis of 
cystic fibrosis (CF) . 

Cystic fibrosis is an autosomal recessive disease, 
15 requiring the presence of a mutant gene on each 

chromosome. CF is the most common genetic disease in 
Caucasians, occurring once in 2,000 live births. It is 
estimated that one in forty Caucasians are carriers for 
the disease. 

2 0 Recently a specific deletion of three adjacent 

basepairs in the open reading frame of the putative CF 
gene leading to the loss of a phenylalanine residue at 
position 508 of the predicted 1480 amino acid 
polypeptide was reported [Kerem et al, Science 245:1073- 
25 1080 (1989)]. Based on haplotype analysis, the deletion 

may account for most CF mutations in Northern European 
populations (about 68%) . A second mutation is 
reportedly prevalent in some Southern European 
populations. Additional data indicate that several 

3 0 other mutations may cause the disease. 

Studies of haplotypes of parents of CF patients 
(who necessarily have one normal and one disease- 
associated haplotype) indicated that there are at least 
178 haplotypes associated with the CF locus. Of those 
3 5 haplotypes, 90 are associated only with the disease; 78 

are found only in normals; and 10 are associated with 
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both the disease and with normals (Kerem et al, supra) . 
The disease apparently is caused by several different 
mutations, some in very low frequency in the population. 
As demonstrated by the haplotype information, there are 
5 more haplotypes associated with the locus than there are 

mutant alleles responsible for the disease. 

A genetic screening program (based on 
amplification of exon regions and analysis of the 
resultant amplified DNA sequence with probes specific 

10 for each of the mutations or with enzymes producing RFLP 

patterns characteristic of each mutation) may take years 
to develop. Such tests would depend on detection and 
characterization of each of the mutations, or at least 
of mutations causing about 90 to 95% or more of the 

15 cases of the disease. The alternative is to detect only 

70 to 80% of the CF~associated genes. That alternative 
is generally considered unacceptable and is the cause of 
much concern in the scientific community. 

The present method directly determines haplotypes 

20 associated with the locus and can detect haplotypes 

among the 178 currently recognized haplotypes associated 
with the disease locus. Additional haplotypes 
associated with the disease are readily determined 
through the rapid analysis of DNA of numerous CF 

25 patients by the methods of this invention. Furthermore, 

any mutations which may be associated with noncoding 
regulatory regions can also be detected by the method 
and will be identified by the screening process. 

Rather than attempting to determine and then 

3 0 detect each defect in a coding region that causes the 

disease, the present method amplifies intron sequences 
associated with the locus to determine allelic and sub- 
allelic patterns. In contrast to use of mutation- 
specific probes where only known sequence defects can be 

3 5 detected, new PDLP and RFLP patterns produced by intron 
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sequences indicate the presence of a previously 
unrecognized haplotype. 

The same analysis can be performed for 
phenylalanine hydroxylase locus nutations that cause 
phenylketonuria and for beta-globin mutations that cause 
beta-thalassemia and sickle cell disease and for other 
loci known to be associated with a genetic disease. 
Furthermore, neither the mutation site nor the location 
for a disease gene is required to determine haplotypes 
associated with the disease. Amplified intron sequences 
in the regions of closely flanking RFLP markers, such as 
are known for Huntington's disease and many other 
inherited diseases, can provide sufficient information 
to screen for haplotypes associated with the disease. 

Muscular dystrophy (MD) is a sex-linked disease. 
The disease-associated gene comprises a 2.3 million 
basepair sequence that encodes 3,685 amino acid protein, 
dystrophin. A map of mutations for 128 of 34 patients 
with Becker's muscular dystrophy and 160 patients with 
Duchenne muscular dystrophy identified 115 deletions and 
13 duplications in the coding region sequence [Den 
Dunnen et al, Am. J. Hum. Genet. 45:835-847 (1989)]. 
Although the disease is associated with a large number 
of mutations that vary widely, the mutations have a non- 
random distribution in the sequence and are localized to 
two major mutation hot spots, Den Dunnen et al, supra. 
Further, a recombination hot spot within the gene 
sequence has been identified [Grimm et al, Am. J. Hum. 
Genet. 45:368-372 (1989) ] . 

For analysis of MD, haplotypes on each side of the 
recombination hot spot are preferably determined. 
Primer pairs defining amplified DNA sequences are 
preferably located near, within about 1 to 10 Kbp of the 
hot spot on either side of the hot spot. In addition, 
due to the large size of the gene, primer pairs defining 
amplified DNA sequences are preferably located near each 

o 
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end of the gene sequence and most preferably also in an 
intermediate location on each side of the hot spot. In 
this way, haplotypes associated with the disease can be 
identified. 

5 Other diseases, particularly malignancies, have 

been shown to be the result of an inherited recessive 
gene together with a somatic mutation of the normal 
gene. One malignancy that is due to such "loss of 
heterogeneity" is retinoblastoma, a childhood cancer. 

10 The loss of the normal gene through mutation has been 

demonstrated by detection of the presence of one 
mutation in all somatic cells (indicating germ cell 
origin) and detection of a second mutation in some 
somatic cells [Scheffer et al, Am. J. Hum. Genet. 

15 45:252-260 (1989)]. The disease can be detected by 

amplifying somatic cell, genomic DNA sequences that 
encompass sufficient intron sequence nucleotides. The 
amplified DNA sequences preferably encompass intron 
sequences locate near one or more of the markers 

2 0 described by Scheffer et al, supra. Preferably, an 

amplified DNA sequence located near an intragenic marker 
and an amplified DNA sequence located near a flanking 
marker are used . 

An exemplary analysis for CF is described in 
25 detail in the examples. Analysis of genetic loci for 

other monogenic and multigenic genetic diseases can be 
performed in a similar manner. 

As the foregoing description indicates, the 
present method of analysis of intron sequences is 

3 0 generally applicable to detection of any type of genetic 

trait. Other monogenic and multigenic traits can be 
readily analyzed by the methods of the present 
invention. Furthermore, the analysis methods of the 
present method are applicable to all eukaryotic cells, 
3 5 and are preferably used on those of plants and animals. 

Examples of analysis of BoLA (bovine MHC determinants) 
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further demonstrates the general applicability of the 
methods of this invention. 

This invention is further illustrated by the 
following specific but non-limiting examples. 
Procedures that are constructively reduced to practice 
are described in the present tense, and procedures that 
have been carried out in the laboratory are set forth in 
the past tense. 



EXAMPLE 1 

Forensic Testing 
DNA extracted from peripheral blood of the 
suspected perpetrator of a crime and DNA from blood 

15 found at the crime scene are analyzed to determine 

whether the two samples of DNA are from the same 
individual or from different individuals. 

The extracted DNA from each sample is used to form 
two replicate aliquots per sample, each aliquot having 

20 1 fig of sample DNA. Each replicate is combined in a 

total volume of 100 /xl with a primer pair (1 /xg of each 
primer), dNTPs (2.5 mM each) and 2.5 units of Taq 
polymerase in amplification buffer (50 mM KC1; 10 mM 
Tris-HCl, pH 8.0; 2.5 mM MgCl 2 ; 100 Mg/itil gelatin) to 

25 form four amplification reaction mixtures. The first 

primer pair contains the primers designated 
SGD005.IIVS1.LNP and SGD009 . AIVS3 . R2NP (A locus- 
specific) . The second primer pair contains the primers 
designated SGD001 . DQA1 . LNP and SGD0 03. DQA1 . RNP (DQA 

3 0 locus-specific) . Each primer is synthesized using an 

Applied Biosystems model 308A DNA synthesizer. The 
amplification reaction mixtures are designated SA 
(suspect's DNA, A locus-specific primers), SD (suspect's 
DNA, DQA1 locus-specific primers) , CA (crime scene DNA, 

35 A locus-specific primers) and CD (crime scene DNA, DQA1 

locus-specific primers) . 
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Each amplification reaction mixture is heated to 
94 °C for 3 0 seconds. The primers are annealed to the 
sample DNA by cooling the reaction mixtures to 65 °C for 
each of the A locus-specific amplification mixtures and 
5 to 55°C for each of the DQA1 locus-specific 

amplification mixtures and maintaining the respective 
temperatures for one minute. The primer extension step 
is performed by heating each of the amplification 
mixtures to 72 9 C for one minute. The denaturation, 
10 annealing and extension cycle is repeated 3 0 times for 

each amplification mixture. 

Each amplification mixture is aliquoted to prepare 
three restriction endonuclease digestion mixtures per 
amplification mixture. The A locus reaction mixtures 
15 are combined with the endonucleases BsrI, CfrlOl and 

Drall. The DQA1 reaction mixtures are combined with 
Alul, CvijI and Ddel. 

To produce each digestion mixture, each of three 
replicate aliquots of 10 jul of each amplification 
2 0 mixture is combined with 5 units of the respective 

enzyme for 60 minutes at 37 °C under conditions 
recommended by the manufacturer of each endonuclease. 

Following digestion, the three digestion mixtures 
for each of the samples (SA, SD, CA and CD) are pooled 
25 and electrophoresed on a 6.5% polyacrylamide gel for 45 

minutes at 100 volts. Following electrophoresis, the gel 
is stained with ethidium bromide. 

The samples contain fragments of the following 
lengths : 

30 
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SA: 786, 619, 596, 462, 427, 399, 256, 251, 93, 80 
CA: 809, 786, 619, 596, 473, 462, 427, 399, 369, 335, 
315, 283, 256, 251, 247, 93, 80 

5 SD: 388, 338, 332, 277, 219, 194, 122, 102, 89, 79, 

64, 55 

CD: 587, 449, 388, 338, 335, 332, 277, 271, 219, 194, 

187, 122, 102, 99, 89, 88, 79, 65, 64 , 55 

10 The analysis demonstrates that the blood from the 

crime scene and from the suspected perpetrator are not 
from the same individual. The blood from the crime 
scene and from the suspected perpetrator are, 
respectively, A3, A9, DQA1 0501, DQA1 03 01 and A9, A9, 

15 DQA1 0501, DQA1 0501. 

EXAMPLE 2 

Paternity Testing 
Chorionic villus tissue was obtained by trans- 

2 0 cervical biopsy from a 7 -week old conceptus (fetus) . 

Blood samples were obtained by venepuncture from the 
mother (M) , and from the alleged father (AF) . DNA was 
extracted from the chorionic villus biopsy, and from the 
blood samples. DNA was extracted from the sample from M 

25 by use of nonionic detergent (Tween 20) and proteinase 

K. DNA was extracted from the sample from F by 
hypotonic lysis. More specifically, 100 /xl of blood was 
diluted to 1.5 ml in PBS and centrifuged to remove buffy 
coat. Following two hypotonic lysis treatments 

30 involving resuspension of buffy coat cells in water, the 

pellets were washed until redness disappeared. 
Colorless pellets were resuspended in water and boiled 
for 20 minutes. Five 10 mm chorionic villus fronds were 
received. One frond was immersed in 2 00 water. NaOH 

35 was added to 0.05 M. The sample was boiled for 20 
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minutes and then neutralized with HCl. No further 
purification was performed for any of the samples. 

The extracted DNA was submitted to PCR for 
amplification of sequences associated with the HLA loci, 
5 DQA1 and DPB1. The primers used were: (1) as a 5 1 

primer for the DQA1 locus, the primer designated 
SGD001 . DQA1 . LNP (DQA 5'IVSl) (corresponding to nt 23-39 
of the DQA1 0301 allele sequence) and as the 3 1 primer 
for the DQA1 locus, the primer designated 

10 SGD003 .DQA1.RNP (DQA 3 f IVS2 corresponding to nt 789-806 

of the DQA1 03 01 sequence; (2) as the DPB primers, the 
primers designated 5 f IVSl nt 7604-7624 and 3 ! IVS2 7910- 
7929, The amplification reaction mixtures were: 150 ng 
of each primer; 25 of test DNA; 10 mM Tris HCl, pH 

15 8.3; 50 mM KCl ; 1,5 mM MgCl 2 ; 0.01% (w/v) gelatin; 

2 00 MM dNTPs; water to 100 jLtl and 2.5 U Taq polymerase. 

The amplification was performed by heating the 
amplification reaction mixture to 94 °C for 10 minutes 
prior to addition of Taq polymerase. For DQA1, the 

20 amplification was performed at 94 °C for 3 0 seconds, then 

55 °C for 3 0 seconds, then 72°C for 1 minute for 30 
cycles, finishing with 72 °C for 10 minutes. For DPB, 
the amplification was performed at 96°C for 30 seconds, 
then 65 °C for 3 0 seconds, finishing with 65 °C for 10 

25 minutes. 

Amplification was shown to be technically 
satisfactory by test gel electrophoresis which 
demonstrated the presence of double stranded DNA of the 
anticipated size in the amplification reaction mixture. 

3 0 The test gel was 2% agarose in TBE (tris borate EDTA) 

buffer, loaded with 15 fil of the amplification reaction 
mixture per lane and electrophoresed at 2 00 v for about 
2 hours until the tracker dye migrated between 6 to 7 cm 
into the 10 cm gel. 

35 The amplified DQA1 and DPB1 sequences were 

subjected to restriction endonuclease digestion using 

169.0018 

82 



• # 

Ddel and MboII (8 and 12 units, respectively at 37°C for 
3 hours) for DQA1, and Rsal and Fokl (8 and 11 units, 
respectively at 37 °C overnight) for DPB1 in 0.5 to 
2.0 fil of enzyme buffers recommended by the supplier, 
5 Pharmacia together with 16-18 /xl of the amplified 

product. The digested DNA was fragment size-length 
separated on gel electrophoresis (3% Nusieve) . The RFLP 
patterns were examined under ultraviolet light after 
staining the gel with ethidium bromide, 

10 Fragment pattern analysis is performed by allele 

assignment of the non-maternal alleles using expected 
fragment sizes based on the sequences of known 
endonuclease restriction sites. The fragment pattern 
analysis revealed the obligate paternal DQA1 allele to 

15 be DQA1 0102 and DPB to be DPwl. The fragment patterns 

were consistent with AF being the biological father. 

To calculate the probability of true paternity, 
HLA types were assigned. Maternal and AF DQA1 types 
were consistent with those predicted from the HLA Class 

2 0 II gene types determined by serological testing using 

lymphocytotoxic antisera. 

Considering alleles of the two HLA loci as being 
in linkage equilibrium, the combined probability of non- 
paternity was given by: 
25 0.042 X 0.314 - 0.013 

i.e. the probability of paternity is (1 - 0.013) or 
98.7%. 

The relative chance of paternity is thus 74:75, 
i.e. the chance that the AF is not the biological father 

3 0 is approximately 1 in 75. The parties to the dispute 

chose to regard these results as confirming the 
paternity of the fetus by the alleged father. 



0 
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EXAMPLE 3 

Analysis of the HLA DQA1 Locus 
The three haplotypes of the HLA DQA1 0102 locus 
were analyzed as described below. Those haplotypes are 
5 DQA1 0102 DR15 Dw2 ; DQA1 0102 DR16 Dw21; and DQA1 0102 

DR13 Dwl9. The distinction between the haplotypes is 
particularly difficult because there is a one basepair 
difference between the 0102 alleles and the 0101 and 
0103 alleles, which difference is not unique in DQA1 
10 allele sequences. 

The procedure used for the amplification is the 
same as that described in Example 1, except that the 
amplification used thirty cycles of 94 °C for 3 0 seconds, 
60 °C for 30 seconds, and 72 °C for 60 seconds. The 
15 sequences of the primers were: 

SGD 001 — 5 1 TTCTGAGCCAGTCCTGAGA 3 1 ; and 
SGD 003 — 5' GATCTGGGGACCTCTTGG 3 ■ . 
These primers hybridize to sequences about 500 bp 
upstream from the 5 1 end of the second exon and 50 bp 

2 0 downstream from the second exon and produce amplified 

DNA sequences in the 700 to 800 bp range. 

Following amplification, the amplified DNA 
sequences were electrophoresed on a 4% polyacrylamide 
gel to determine the PDLP type. In this case, amplified 
25 DNA sequences for 0102 comigrate with (are the same 

length as) 0101 alleles and subsequent enzyme digestion 
is necessary to distinguish them. 

The amplified DNA sequences were digested using 
the restriction enzyme Alul (Bethesda Research 

3 0 Laboratories) which cleaves DNA at the sequence AGCT. 

The digestion was performed by mixing 5 units (1 of 
enzyme with 10 jul of the amplified DNA sequence (between 
about 0.5 and l jug of DNA) in the enzyme buffer provided 
by the manufacturer according to the manufacturer's 
3 5 directions to form a digest. The digest was then 
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incubated for 2 hours at 37 °C for complete enzymatic 
digestion. 

The products of the digestion reaction are mixed 
with approximately 0.1 jxg of "ladder 11 nucleotide 
5 sequences (nucleotide control sequences beginning at 

123 bp in length and increasing in length by 12 3 bp to a 
final size of about 5,000 bp; available commercially 
from Bethesda Research Laboratories, Bethesda MD) and 
were electrophoresed using a 4% horizontal ultra-thin 
10 polyacrylamide gel (E-C Apparatus, Clearwater FLA). 

The bands in the gel were visualized (stained) using 
silver stain technique [Allen et al, BioTechniques 
7:736-744 (1989) ] . 

Three distinctive fragment patterns which 
15 correspond to the three haplotypes were produced using 

Alul. The patterns (in base pair sized fragments) were: 

1. DR15 DQ6 Dw2: 120, 350, 370, 480 

2. DR13 DQ6 Dwl9 : 120, 330, 350, 480 

3. DR16 DQ6 Dw21: 120, 330, 350 

20 

The procedure was repeated using a 6.5% vertical 
polyacrylamide gel and ethidium bromide stain and 
provided the same results. However, the fragment 
patterns were more readily distinguishable using the 

25 ultrathin gels and silver stain. 

This exemplifies analysis according to the method 
of this invention. Using the same procedure, 2 0 of the 
other 32 DR/DQ haplotypes for DQA1 were identified using 
the same primer pair and two additional enzymes (Ddel 

3 0 and MboII) . PDLP groups and fragment patterns for each 

of the DQA1 haplotypes with the three endonucleases are 
illustrated in Table 6. 
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5 
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This example illustrates the ability of the method 
of this invention to distinguish the alleles and 
haplotypes of a genetic locus. Specifically, the 
example shows that PDLP analysis stratifies five of the 
5 eight alleles. These three restriction endonuclease 

digests distinguish each of the eight alleles and many 
of the 35 known haplotypes of the locus. The use of 
additional endonuclease digests for this amplified DNA 
sequence can be expected to distinguish all of the known 
10 haplotypes and to potentially identify other previously 

unrecognized haplotypes. Alternatively, use of the same 
or other endonuclease digests for another amplified DNA 
sequence in this locus can be expected to distinguish 
the haplotypes. 

15 In addition, analysis of amplified DNA sequences 

at the DRA locus in the telomeric direction and DQB in 
the centromeric direction, preferably together with 
analysis of a central locus, can readily distinguish all 
of the haplotypes for the region. 

20 The same methods are readily applied to other 

loci . 

EXAMPLE 4 

Analysis of the HLA DQA1 Locus 
25 The DNA of an individual is analyzed to determine 

which of the three haplotypes of the HLA DQA1 0102 locus 
are present. Genomic DNA is amplified as described in 
Example 3. Each of the amplified DNA sequences is 
sequenced to identify the haplotypes of the individual. 
3 0 The individual is shown to have the haplotypes DR15 DQ6 

DW2; DR13 DQ6 Dwl9 . 

The procedure is repeated as described in Example 
3 through the production of the Alul digest. Each of the 
digest fragments is sequenced. The individual is shown 
35 to have the haplotypes DR15 DQ6 Dw2 ; DR13 DQ6 Dwl9 . 
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EXAMPLE 5 

DQA1 Allele-Specif ic Amplification 
Primers were synthesized that specifically bind 
the 0102 and 0301 alleles of the DQA1 locus- The 5» 
5 primer was the SGD 001 primer used in Example 3. The 

sequences of the 3 f primers are listed below. 
0102 5' TTGCTGAACTCAGGCCACC 3» 
03 01 5» TGCGGAACAGAGGCAACTG 3 1 
The amplification was performed as described in Example 
10 3 using 30 cycles of a standard (94 °C, 60 °C, 72 °C) PCR 

reaction. The template DNAs for each of the 0101, 0301 
and 0501 alleles were amplified separately. As 
determined by gel electrophoresis, the 0102-allele- 
specific primer amplified only template 0102 DNA and the 
15 0301-allele-specif ic primer amplified only template 0301 

DNA. Thus, each of the primers was allele-specif ic. 

EXAMPLE 6 

20 Detection of Cystic Fibrosis 

The procedure used for the amplification described 
in Example 3 is repeated. The sequences of the primers 
are illustrated below. The first two primers are 
upstream primers, and the third is a downstream primer. 
25 The primers amplify a DNA sequence that encompasses all 

of intervening sequence 1 

5 1 CAG AGG TCG CCT CTG GA 3 ' ; 
5 1 AAG GCC AGC GTT GTC TCC A3 1 ; and 
3 1 CCT CAA AAT TGG TCT GGT 5 1 . 
3 0 These primers hybridize to the complement of sequences 

located from nt 136-152 and nt 154-172, and to nt 187- 
207. [The nucleotide numbers are found in Riordan et 
al, Science 245:1066-1072 (1989).] 

Following amplification, the amplified DNA 
35 sequences are electrophoresed on a 4% polyacrylamide gel 

to determine the PDLP type. The amplified DNA sequences 
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are separately digested using each of the restriction 
enzymes Alul, Mnll and Rsal (Bethesda Research 
Laboratories) • The digestion is performed as described 
in Example 3 . The products of the digestion reaction 
are electrophoresed and visualized using a 4% horizontal 
ultra-thin polyacrylamide gel and silver stain as 
described in Example 3 . 

Distinctive fragment patterns which correspond to 
disease-associated and normal haplotypes are produced. 



EXAMPLE 7 , 

Analysis of Bovine HLA Class I 
'Bovine HLA Class I alleles and haplotypes are 
analyzed in the same manner as described in Example 3 . 
15 The primers are listed below. 



Bovine Primers (Class I HLA homolog) T m 
5 1 primer: 5' TCC TGG TCC TGA CCG AGA 3 1 (62°) 
3 1 primer: 1) 3 1 A TGT GCC TTT GGA GGG TCT 5" (62°) 
20 (for "600 bp product) 

2) 3 1 GCC AAC AT GAT CCG CAT 5 1 (62°) 
(for -900 bp product) 
For the approximately 900 bp sequence PDLP 
analysis is sufficient to distinguish alleles 1 and 3 
25 (893 and 911 bp, respectively) . Digests are prepared as 

described in Example 3 using Alul and Ddel. The 
following patterns are produced for the 900 bp sequence. 

Allele 1, Alul digest: 712, 181 
30 Allele 3, Alul digest: 430, 300, 181 

Allele 1, Ddel digest: 445, 201, 182, 28 
Allele 3, Ddel digest: 406, 185, 182, 28, 16 

35 The 600 bp sequence also produces distinguishable 

fragment patterns for those alleles. However, those 
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patterns are not as dramatically different as the 
patterns produced by the 600 bp sequence digests. 

EXAMPLE 8 

5 Preparation of Primers 

Each of the following primers is synthesized using 
an Applied Biosystems model 308A DNA synthesizer. 

HIiA locus primers 
A locus-specific primers 
10 SGD009 . AIVS3 .R2NP CATGTGGCCATCTTGAGAATGGA 

SGD006 . AIVS3 . R1NP GCCCGGGAGATCTACAGGCGATCA 
A2 . 1 CGCCTCCCTGATCGCCTGTAG 
A2 . 2 CCAGAGAGTGACTCTGAGG 
A2 . 3 CACAATTAAGGGAT 

i 

15 

B locus-specific primers 

SGD007 . BIVS3 . R1NP TCCCCGGCGACCTATAGGAGATGG 
SGD010 . BIVS3 . R2NP CTAGGACCACCCATGTGACCAGC 
B2 . 1 ATCTCCTCAGACGCCGAGATGCGTCAC 

2 0 B2 . 2 CTCCTGCTGCTCTGGGGGGCAG 

B2 - 3 ACTTTACCTCCACTCAGATCAGGAG 

B2 . 4 CGTCCAGGCTGGTGTCTGGGTTCTGTGCCCCT 

B2 • 5 CTGGTCACATGGGTGGTCCTAGG 

B2.6 CGCCTGAATTTTCTGACTCTTCCCAT 

25 

C locus-specific primers 

SGD008 . CIVS 3 . R1NP ATCCCGGGAGATCTACAGGAGATG 
SGD011 . CIVS 3 . R2NP AACAGCGCCCATGTGACCATCCT 
C2 . 1 CTGGGGAGGCGCCGCGTTGAGGATTCT 

3 0 C2 . 2 CGTCTCCGCAGTCCCGGTTCTAAAGTTCCCAGT 

C2 . 3 ATCCTCGTGCTCTCGGGA 
C2 • 4 TGTGGTCAGGCTGCTGAC 
C2 . 5 AAGGTTTGATTCCAGCTT 

C2 • 6 CCCCTTCCCCACCCCAGGTGTTCCTGTCCATTCTTCAGGA 
35 C 2 • 7 C ACATGGGCGCTGTTGGAGTGTCG 
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Class I loci-specific primers 

SGD005 . IIVS1 . LNP GTGAGTGCGGGGTCGGGAGGGA 

1 . 1 CACCCACCGGGACTCAGA 

1 . 2 TGGCCCTGACCCAGACCTGGGC 
5 1.3 GAGGGTCGGGCGGGTCTCAGC 

1 . 4 CTCTCAGGCCTTGTTC 

1 . 5 CAGAAGTCGCTGTTCC 

DQA1 locus-specific primers 
10 SGDO 0 1 • DQA1 . LNP TTCTGAGCCAGTCCTGAGA 

DQA3 Ela TTGCCCTGACCACCGTGATG 
DQA3 Elb CTTCCTGCTTGTCATCTTCA 
DQA3 Elc CCATGAATTTGATGGAGA 
DQA3 Eld ACCGCTGCTACCAATGGTA 
15 SGD003 .DQA1.RNP CCAAGAGGTCCCCAGATC 

DRA locus-specific primers 
DRA El TCATCATAGCTGTGCTGATG 
DRA 5'E2 AGAACATGTGATCATCCAGGC 
2 0 DRA 3 f E2 CCAACTATACTCCGATCACCAAT 

DRB locus-specific primers 
DRB El TGACAGTGACACTGATGGTGCTG 
DRB 5»E2 GGGGACACCCGACCACGTTTC 
25 DRB 3 f E2 TGCAGACACAACTACGGGGTTG 



DOB1 locus-specific primers 
DQB El TGGCTGAGGGCAGAGACTCTCCC 
DQB 5'E2 TGCTACTTCACCAACGGGAC 
3 0 DQB 3'E2 GGTGTGCACACACAACTAC 

DQB 5'IVSla AGGTATTTTACCCAGGGACCAAGAGAT 
DQB 5 f IVSlb ATGTAAAATCAGCCCGACTGCCTCTTC 
DQB 3 f IVS2 GCCTCGTGCCTTATGCGTTTGCCTCCT 

3 5 DPB1 locus-specific primers 

DPB El TGAGGTTAATAAACTGGAGAA 



169.0018 



93 




DPB 5'IVSl GAGAGTGGCGCCTCCGCTCAT 
DPB 3'IVS2 GAGTGAGGGCTTTGGGCCGG 
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